5 Best Ways to Delete a Column from a DataFrame in Python

Rate this post

πŸ’‘ Problem Formulation: When working with data in Python, manipulating dataframes is a common task using libraries like pandas. At times, you may need to remove unnecessary or redundant columns from your dataset for analysis, memory efficiency, or data privacy reasons. For instance, if a dataframe has a column “unnecessary_info” which is not needed for your analysis, it needs to be dropped so that the dataframe only includes relevant data.

Method 1: Drop using drop() Method

The drop() method in pandas is perhaps the most straightforward way to remove a column. It requires specifying the column label and axis parameter (axis=1 for columns, axis=0 for rows). By default, this method returns a new dataframe without altering the original one, unless you set the inplace argument to True.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
df.drop('B', axis=1, inplace=True)

Output:

   A  C
0  1  7
1  2  8
2  3  9

This code snippet creates a dataframe with three columns and then uses drop() with the column label ‘B’ to remove the second column. By setting inplace=True, it modifies the original dataframe instead of returning a new one.

Method 2: Use the del Statement

The del statement in Python can be used to delete objects. When working with dataframes, it allows you to remove a column in place efficiently. It is a Python built-in approach, making it a quick and easy option for column removal.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
del df['B']

Output:

   A  C
0  1  7
1  2  8
2  3  9

In this snippet, we define a dataframe and then apply the del statement to remove the column ‘B’ from it directly. The del statement operates in place and does not return a new dataframe.

Method 3: Drop Columns using pop() Method

The pop() method of a dataframe not only deletes the column but also returns its contents. This can be particularly useful if you need to use the removed data immediately after deleting it from the dataframe.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
removed_column = df.pop('B')

Output:

   A  C
0  1  7
1  2  8
2  3  9

removed_column
0    4
1    5
2    6
Name: B, dtype: int64

This code directly removes the column ‘B’ from the dataframe and stores the removed column’s data in the variable removed_column. This might come in handy for subsequent computations or checks.

Method 4: Select Columns to Keep

Instead of specifying which columns to delete, you can select and keep the desired columns, creating a new dataframe in the process. This method is useful when you have a clear idea of which columns are required for your analysis.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
df = df[['A', 'C']]

Output:

   A  C
0  1  7
1  2  8
2  3  9

In this example, the original dataframe is reassigned to a new dataframe containing only the columns ‘A’ and ‘C’. The column ‘B’ is no longer part of the dataframe after this operation.

Bonus One-Liner Method 5: Drop using iloc

The iloc method, which stands for “integer location-based indexing,” is a powerful pandas feature that can be used to select specific rows and columns by their integer index. To drop a column, one can overwrite the dataframe with all columns except the one to be dropped.

Here’s an example:

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})
df = df.iloc[:, [0, 2]]

Output:

   A  C
0  1  7
1  2  8
2  3  9

Here, the dataframe is updated to include only the 0th and 2nd columns (corresponding to columns ‘A’ and ‘C’), effectively dropping the 1st column (column ‘B’).

Summary/Discussion

Method 1: Using drop(). Versatile and explicit. It might be less efficient with large dataframes if a new object is created.
Method 2: Using del statement. Pythonic and in-place operation. Cannot be chained with other dataframe methods.
Method 3: Using pop(). Removes and returns the column’s data. Like del, cannot be chained.
Method 4: Selecting columns to keep. Great for cleaner code when keeping a small subset. Requires rewriting all column names you wish to keep.
Method 5: Using iloc. Effective for dropping based on column index. Requires knowledge of column positions and may become error-prone with many columns.