5 Best Ways to Drop Columns in a Pandas DataFrame

πŸ’‘ Problem Formulation: When working with data in Python, you may encounter situations where you need to streamline your datasets by removing redundant or unnecessary columns. For instance, given a DataFrame with columns 'A', 'B', 'C', and 'D', you might want to eliminate columns 'B' and 'D' to focus on the most relevant data. This article provides an example of input and desired output, illustrating various methods using pandas:

Input DataFrame:
   A  B  C  D
0  1  3  5  7
1  2  4  6  8

Desired Output:
   A  C
0  1  5
1  2  6

Method 1: Using the drop Method

In pandas, the drop method allows for an easy way to drop specified labels from rows or columns. Specifying the axis=1 argument indicates that the function should remove columns rather than rows. For those more comfortable with SQL, it is akin to selecting certain columns to display in a query output but in reverse.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6], 'D': [7,8]})

# Drop columns 'B' and 'D'
df = df.drop(['B', 'D'], axis=1)
print(df)

Output:

   A  C
0  1  5
1  2  6

This code snippet demonstrates the removal of columns ‘B’ and ‘D’ from a DataFrame. After dropping the columns, only the data for columns ‘A’ and ‘C’ remain.

Method 2: Using del Statement

The del statement is a Python-specific way of removing items from a dictionary-like object based on their keys, which, when applied to a DataFrame, means directly dropping named columns in place, modifying the original DataFrame without the need for reassignment.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6], 'D': [7,8]})

# Remove column 'B'
del df['B']

# Remove column 'D'
del df['D']
print(df)

Output:

   A  C
0  1  5
1  2  6

This code snippet uses the del statement to remove columns ‘B’ and ‘D’ from the DataFrame, modifying the original DataFrame without the need to create a new one.

Method 3: Selecting Specific Columns

Instead of dropping columns, you can select only the columns you want to retain. This approach is particularly handy when there are many columns to drop, making it more concise to specify the ones you want to keep.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6], 'D': [7,8]})

# Select only columns 'A' and 'C'
df = df[['A', 'C']]
print(df)

Output:

   A  C
0  1  5
1  2  6

The code snippet selects only the columns ‘A’ and ‘C’ from the original DataFrame, effectively discarding the other columns and creating a new DataFrame in the process.

Method 4: Using pop Method

The pop method removes a column and returns it as a series, which can be useful if you want to use the data from the column right after dropping it from the DataFrame. It’s an in-place operation, so the original DataFrame is modified.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6], 'D': [7,8]})

# Pop and drop columns 'B' and 'D'
b_col = df.pop('B')
d_col = df.pop('D')
print(df)

Output:

   A  C
0  1  5
1  2  6

This snippet pops columns ‘B’ and ‘D’ out of the DataFrame. While the columns are dropped, their data is also captured in separate variables, which could be used subsequently in the code.

Bonus One-Liner Method 5: Using drop with inplace=True

Finally, you can use the inplace=True parameter with the drop method to remove columns in place without assigning back to the DataFrameβ€”this modifies the original DataFrame and is very memory-efficient.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1,2], 'B': [3,4], 'C': [5,6], 'D': [7,8]})

# Drop columns 'B' and 'D' in place
df.drop(['B', 'D'], axis=1, inplace=True)
print(df)

Output:

   A  C
0  1  5
1  2  6

This one-liner code snippet efficiently drops columns ‘B’ and ‘D’ by modifying the original DataFrame in place, thereby saving the additional step of reassignment and being memory efficient.

Summary/Discussion

  • Method 1: Using drop Method. Versatile with the ability to work on rows and columns. Requires a new variable or reassignment.
  • Method 2: Using del Statement. Pythonic and modifies the DataFrame in place. Cannot be chained with other DataFrame methods.
  • Method 3: Selecting Specific Columns. Ideal for selectively choosing a small subset of columns, but creates a new DataFrame which may not be memory efficient for large DataFrames.
  • Method 4: Using pop Method. Immediately use the data of the dropped column, but can only operate on one column at a time.
  • Bonus Method 5: Using drop with inplace=True. Streamlined memory-efficient in-place modification without needing reassignment, but should be used with care to avoid unintended data loss.