5 Best Ways to Remove Columns in a Pandas DataFrame in Python

Rate this post

πŸ’‘ Problem Formulation: When working with data in Python, using Pandas DataFrame is a standard. But oftentimes we find ourselves with more information than needed, and hence, we may want to remove unnecessary columns. Suppose you have a DataFrame ‘df’ with columns [‘A’, ‘B’, ‘C’, ‘D’] and want to remove ‘B’ and ‘D’ to simplify your data analysis process, achieving a DataFrame with just columns [‘A’, ‘C’].

Method 1: Using drop() Method

The drop() method in pandas is a straightforward way to remove one or more columns from a DataFrame. You just need to specify the labels of the columns and the axis along which the labels will be dropped (axis=1 for columns).

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Drop columns 'B' and 'D'
df = df.drop(['B', 'D'], axis=1)
print(df)

Output:

   A  C
0  1  7
1  2  8
2  3  9

The code snippet above removes columns ‘B’ and ‘D’ from the DataFrame ‘df’ by passing them as a list to the drop() method while specifying axis=1 to indicate that columns (not rows) should be dropped.

Method 2: Using Column Assignment

Columns in a DataFrame can be removed by assigning a subset of the existing columns to the DataFrame. This method is more manual and explicit in selecting which columns to retain.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Retain only columns 'A' and 'C'
df = df[['A', 'C']]
print(df)

Output:

   A  C
0  1  7
1  2  8
2  3  9

This method involves explicitly assigning to ‘df’ a subset of its columns. Here, we keep only columns ‘A’ and ‘C’, effectively removing ‘B’ and ‘D’. This method is more about selecting what to keep rather than what to remove.

Method 3: Using del Statement

The del statement is a Python built-in feature that can be used to delete objects. In the context of Pandas DataFrames, it can specifically delete columns by directly referencing them.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Delete columns 'B' and 'D'
del df['B']
del df['D']
print(df)

Output:

   A  C
0  1  7
1  2  8
2  3  9

By using the del statement, we directly remove columns ‘B’ and ‘D’ from the DataFrame ‘df’. This approach is Pythonic and very efficient memory-wise since it modifies the DataFrame in place.

Method 4: Using pop() Method

The pop() method removes the specified column and returns its values as a Pandas Series. It’s useful when you want to not only remove a column but also use its data immediately without having to access the DataFrame again.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Pop column 'B'
popped_column_B = df.pop('B')
print(df)
print(popped_column_B)

Output:

   A  C  D
0  1  7 10
1  2  8 11
2  3  9 12

0    4
1    5
2    6
Name: B, dtype: int64

After using the pop() method, the DataFrame ‘df’ no longer includes the column ‘B’, and we have its data saved into ‘popped_column_B’, which can be used independently in subsequent operations.

Bonus One-Liner Method 5: Using List Comprehension with Columns

You can combine list comprehension with the DataFrame’s columns attribute to selectively keep or drop columns. This is a concise and flexible one-liner approach.

Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Keep columns 'A' and 'C' using list comprehension
df = df[[col for col in df.columns if col not in ['B', 'D']]]
print(df)

Output:

   A  C
0  1  7
1  2  8
2  3  9

This code demonstrates a list comprehension that iterates over the DataFrame’s columns and includes only those not listed in the ‘if not in’ part of the comprehension. This enables us to easily exclude specific columns with a neat one-liner code.

Summary/Discussion

  • Method 1: Using drop(). Strengths: Highly readable, very flexible. Weaknesses: Creates a new DataFrame by default, which could be less efficient with very large DataFrames.
  • Method 2: Column Assignment. Strengths: Explicit control over what to keep. Weaknesses: Less readable when removing many columns, as you have to type out all columns you want to retain.
  • Method 3: Using del. Strengths: Efficient, Pythonic. Weaknesses: Cannot be chained with other DataFrame methods.
  • Method 4: Using pop(). Strengths: Removes a column and provides its data directly. Weaknesses: Only works on a single column at a time.
  • Method 5: List Comprehension with Columns. Strengths: Concise and Pythonic. Weaknesses: Might be less readable for users unfamiliar with list comprehensions.