5 Best Ways to Concatenate Two or More Pandas DataFrames Along Rows

πŸ’‘ Problem Formulation: When working with data in Python, analysts often need to combine multiple datasets into one comprehensive DataFrame. The pandas library offers powerful tools for this. Say a data analyst has several DataFrames representing different months of sales data; they aim to create a single DataFrame with sales data for the entire year. This article shows how to concatenate two or more DataFrames along rows, forming a unified dataset.

Method 1: Using the pd.concat() function

The pd.concat() function is a versatile tool for concatenating pandas objects along a particular axis. By setting the axis parameter to 0, the function combines DataFrames vertically, stacking them row-wise. This method is highly efficient and suitable for combining multiple DataFrames with the same or different columns.

Here’s an example:

import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenating DataFrames
result = pd.concat([df1, df2], axis=0)

print(result)

Output:

   A  B
0  1  3
1  2  4
0  5  7
1  6  8

This code snippet demonstrates concatenating two DataFrames, df1 and df2, into a single DataFrame result. Rows from df2 are appended to df1 along the default axis (0), which is the row axis. Indexes from the original DataFrames are maintained, which may lead to duplicate index values.

Method 2: Using pd.concat() with ignore_index=True

By using the ignore_index parameter and setting it to True, the pd.concat() function will reset the index of the resulting DataFrame. This is particularly useful when the original index does not carry significant meaning and a new sequential index is preferred.

Here’s an example:

import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({'C': [9, 10], 'D': [11, 12]})
df2 = pd.DataFrame({'C': [13, 14], 'D': [15, 16]})

# Concatenating DataFrames with a reset index
result = pd.concat([df1, df2], ignore_index=True)

print(result)

Output:

    C   D
0   9  11
1  10  12
2  13  15
3  14  16

This code concatenates df1 and df2 while the ignore_index=True parameter creates a new 0-based index for the resulting DataFrame, avoiding any duplicate indices.

Method 3: Using append() method

The append() method in pandas provides a shortcut to the concat() function when combining DataFrames along rows. It is a simpler syntax for straightforward row-wise concatenation but is less flexible than concat().

Here’s an example:

import pandas as pd

# Sample DataFrame
df1 = pd.DataFrame({'E': [17, 18], 'F': [19, 20]})
df2 = pd.DataFrame({'E': [21, 22], 'F': [23, 24]})

# Appending DataFrames
result = df1.append(df2)

print(result)

Output:

    E   F
0  17  19
1  18  20
0  21  23
1  22  24

The append() method combines df1 and df2, resulting in a DataFrame with the rows of df2 added to those of df1. It’s a concise way to concatenate along rows, retaining the original indices.

Method 4: Using append() with ignore_index=True

Similar to method 2, when using the append() method, the ignore_index=True parameter can be utilized to reset indices, creating a seamless sequence from 0 to the last row.

Here’s an example:

import pandas as pd

# Sample DataFrame
df1 = pd.DataFrame({'G': [25, 26], 'H': [27, 28]})
df2 = pd.DataFrame({'G': [29, 30], 'H': [31, 32]})

# Appending and resetting index
result = df1.append(df2, ignore_index=True)

print(result)

Output:

    G   H
0  25  27
1  26  28
2  29  31
3  30  32

This example illustrates the combination of df1 with df2 using the append() method with a reset index, which creates a new DataFrame with consecutive indices.

Bonus One-Liner Method 5: Using List Comprehension with pd.concat()

For more advanced users wishing to concatenate a list of DataFrames stored in a Python list or generated on-the-fly, list comprehension with pd.concat() can be applied in a one-liner fashion. This method is compact and suitable for dynamic concatenation tasks.

Here’s an example:

import pandas as pd

# Generating sample DataFrames in a list
dataframes = [pd.DataFrame({'I': [i, i+1], 'J': [i+2, i+3]}) for i in range(0, 8, 2)]

# One-liner concatenation using list comprehension
result = pd.concat(dataframes, ignore_index=True)

print(result)

Output:

    I   J
0   0   2
1   1   3
2   2   4
3   3   5
4   4   6
5   5   7
6   6   8
7   7   9

Here, the code first creates a list of DataFrames using list comprehension, and then it concatenates them all at once using the pd.concat() method with ignore_index=True to ensure a clean, ordered index.

Summary/Discussion

  • Method 1: Using pd.concat(). Strengths: Highly efficient and versatile for complex scenarios. Weaknesses: Can result in duplicate indices if not handled properly.
  • Method 2: Using pd.concat() with ignore_index=True. Strengths: Provides a clean index. Weaknesses: A bit more verbose than some other methods.
  • Method 3: Using append() method. Strengths: Simplified syntax, good for quick tasks. Weaknesses: Less flexible than pd.concat() and can also result in duplicate indices.
  • Method 4: Using append() with ignore_index=True. Strengths: Consecutive indexing with simpler syntax. Weaknesses: Same as method 3 with regards to flexibility.
  • Method 5: Using list comprehension with pd.concat(). Strengths: Compact and elegant for concatenating multiple DataFrames. Weaknesses: Can be less readable to those unfamiliar with list comprehensions.