5 Best Ways to Concatenate Multiple Pandas DataFrames in Python

πŸ’‘ Problem Formulation: When working with datasets in Python, a common task is to combine several pandas DataFrames into one. For instance, you may have monthly sales data in separate DataFrames and you want to concatenate them into a single DataFrame for yearly analysis. Here, we’ll explore several methods to concatenate more than two DataFrames in an efficient and Pythonic way.

Method 1: Using pd.concat()

This method involves using the pd.concat() function, which is specifically designed to concatenate pandas objects along a particular axis with optional set logic along the other axes. This function is straightforward and can handle concatenation tasks with various options for indexing, hierarchical indexing, and preserving the dataframe’s metadata.

Here’s an example:

import pandas as pd

# Example DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]})

# Concatenating the DataFrames
result = pd.concat([df1, df2, df3])

# Display the result
print(result)

Output:

    A   B
0   1   3
1   2   4
0   5   7
1   6   8
0   9  11
1  10  12

This code snippet creates three simple DataFrames and then concatenates them into one using pd.concat(). Here the indices of the original DataFrames are preserved, which might not always be the desired result, but pd.concat() provides an option to ignore the index and instead create a continuous numerical index.

Method 2: Using pd.concat() with ignore_index=True

The pd.concat() function also allows us to concatenate DataFrames while ignoring the original index and creating a new continuous index by setting the parameter ignore_index=True. This is particularly useful when the original index does not carry meaningful information after concatenation.

Here’s an example:

# Concatenating the DataFrames with a new continuous index
result_with_new_index = pd.concat([df1, df2, df3], ignore_index=True)

# Display the result
print(result_with_new_index)

Output:

    A   B
0   1   3
1   2   4
2   5   7
3   6   8
4   9  11
5  10  12

In this example, the same DataFrames are concatenated, but with a new continuous index. Setting ignore_index=True in pd.concat() overrides the original indices and makes the output DataFrame’s index range from 0 to n-1, where n is the total number of rows.

Method 3: Using append() in a loop

The append() method in pandas is a shortcut to concatenate two DataFrames. If you have more than two DataFrames, you can use a loop to append each DataFrame into one master DataFrame.

Here’s an example:

# Initiating an empty DataFrame
result = pd.DataFrame()

# Appending each DataFrame to the result in a loop
for df in [df1, df2, df3]:
    result = result.append(df, ignore_index=True)

# Display the result
print(result)

Output:

    A   B
0   1   3
1   2   4
2   5   7
3   6   8
4   9  11
5  10  12

This code snippet demonstrates how to start with an empty DataFrame and then sequentially append three example DataFrames to it. The ignore_index=True parameter ensures a continuous index in the final concatenated DataFrame.

Method 4: Using List Comprehension and pd.concat()

List comprehension can be used in conjunction with pd.concat() to concatenate a list of DataFrames in a concise and pythonic way. This method allows for scaling up to an arbitrary number of DataFrames.

Here’s an example:

# DataFrames list
dataframes = [df1, df2, df3]

# Concatenate using List Comprehension
result = pd.concat([df for df in dataframes], ignore_index=True)

# Display the result
print(result)

Output:

    A   B
0   1   3
1   2   4
2   5   7
3   6   8
4   9  11
5  10  12

Here we create a list of DataFrames and concatenate them into one DataFrame using list comprehension inside pd.concat(). With ignore_index=True, a new continuous index is created for the concatenated DataFrame.

Bonus One-Liner Method 5: Using reduce() from functools

For a more functional programming approach, you can use the reduce() function from the functools module with pd.concat() to concatenate a list of DataFrames. This method applies pd.concat() cumulatively to the items of the iterable, from left to right, so as to reduce the iterable to a single DataFrame.

Here’s an example:

from functools import reduce

# Use reduce to apply pd.concat to concatenate
result = reduce(lambda x, y: pd.concat([x, y], ignore_index=True), dataframes)

# Display the result
print(result)

Output:

    A   B
0   1   3
1   2   4
2   5   7
3   6   8
4   9  11
5  10  12

By using reduce() with lambda and pd.concat(), the DataFrames in the list are sequentially concatenated. With each iteration, two DataFrames are concatenated until a single DataFrame is produced.

Summary/Discussion

  • Method 1: Using pd.concat(). It’s the standard way, allowing various customizations. However, it might not be as efficient with large numbers of DataFrames.
  • Method 2: Using pd.concat() with ignore_index=True. It provides a cleaner index in the final DataFrame but is otherwise similar to Method 1.
  • Method 3: Using append() in a loop. This is an intuitive method that’s easy to understand. It may be less efficient than using pd.concat() directly due to the loop.
  • Method 4: Using List Comprehension and pd.concat(). Offers clean and readable code, which is a pythonic way of concatenating many DataFrames.
  • Method 5: Using reduce() from functools. This is a more functional programming approach and can be very efficient but might be less readable to those unfamiliar with functional programming paradigms.