How to Select Multiple Columns in Pandas

Rate this post

The easiest way to select multiple columns in Pandas is to pass a list into the standard square-bracket indexing scheme. For example, the expression df[['Col_1', 'Col_4, 'Col_7']] would access columns 'Col_1', 'Col_4', and 'Col_7'. This is the most flexible and concise way for only a couple of columns.

To learn about the best 3 ways to accomplish this in alternative scenarios, read on!

Problem Formulation

Say, you create the following Pandas DataFrame:

import pandas as pd

# Create DataFrame
data = [['Alice', 24, 168, 100000, 'blue', 'blonde'],
        ['Bob', 37, 164, 20000, 'blue', 'black'],
        ['Carl', 18, 201, 120000, 'grey', 'grey']]

df = pd.DataFrame(data, columns = ['Name', 'Age', 'Height',
                                   'Income', 'Eyes', 'Hairs'])

It looks like this:

print(df)
'''
    Name  Age  Height  Income  Eyes   Hairs
0  Alice   24     168  100000  blue  blonde
1    Bob   37     164   20000  blue   black
2   Carl   18     201  120000  grey    grey
'''

Problem: How to select multiple columns from this DataFrame?

For example, how to select columns ['Name', 'Income', 'Eyes', 'Hairs'] from the DataFrame to obtain the following DataFrame?





Method 1: Basic List-Based Indexing

List-based indexing in Pandas allows you to pass multiple column names as a list into the square-bracket selector. For example, df[['A', 'B', 'C']] would select columns 'A', 'B', and 'C' of the DataFrame df. The resulting DataFrame has the columns in the order of the passed list.

# Original DataFrame:
'''
    Name  Age  Height  Income  Eyes   Hairs
0  Alice   24     168  100000  blue  blonde
1    Bob   37     164   20000  blue   black
2   Carl   18     201  120000  grey    grey
'''

Here’s how you’d select columns ['Name', 'Income', 'Eyes', 'Hairs'] from the DataFrame in the problem formulation:

# Method 1: List-Based Indexing
df_1 = df[['Name', 'Income', 'Eyes', 'Hairs']]
print(df_1)
'''
    Name  Income  Eyes   Hairs
0  Alice  100000  blue  blonde
1    Bob   20000  blue   black
2   Carl  120000  grey    grey
'''

The order of the columns matters. If you reverse them, you get the following DataFrame with columns in reversed order:

df_1 = df[['Hairs', 'Eyes', 'Income', 'Name']]
print(df_1)
'''
    Hairs  Eyes  Income   Name
0  blonde  blue  100000  Alice
1   black  blue   20000    Bob
2    grey  grey  120000   Carl
'''

Method 2: Integer-Based Indexing

You can use the df.iloc[rows, columns] syntax to access individual columns using zero-based indexing with the first column having index 0, the second index 1, and so on.

  • rows selects individual rows—you can use the slicing colon : to indicate that all rows should be selected.
  • columns selects individual columns, you can pass a list of column identifiers as integers.
# Original DataFrame:
'''
    Name  Age  Height  Income  Eyes   Hairs
0  Alice   24     168  100000  blue  blonde
1    Bob   37     164   20000  blue   black
2   Carl   18     201  120000  grey    grey
'''

Here’s an example on the DataFrame from the problem formulation:

df_2 = df.iloc[:, [0, 3, 4, 5]]
print(df_2)
'''
    Name  Income  Eyes   Hairs
0  Alice  100000  blue  blonde
1    Bob   20000  blue   black
2   Carl  120000  grey    grey
'''

You can also use slicing as an alternative to the explicit list-based argument to access multiple columns—this is one of the main advantages of using df.iloc[...] after all!

df_2 = df.iloc[:, 3:6]
print(df_2)
'''
   Income  Eyes   Hairs
0  100000  blue  blonde
1   20000  blue   black
2  120000  grey    grey
'''

In the example, the start index is 3 (included) and the stop index is 6 (excluded). So, all columns with identifiers 3, 4, and 5 are added to the final DataFrame.

Method 3: Name-Based Indexing

To select multiple columns by name, you can also use the df.loc[...] selector. It allows you to use slicing on column names instead of integer identifiers which can be more comfortable.

Let’s quickly recap the original DataFrame:

# Original DataFrame:
'''
    Name  Age  Height  Income  Eyes   Hairs
0  Alice   24     168  100000  blue  blonde
1    Bob   37     164   20000  blue   black
2   Carl   18     201  120000  grey    grey
'''

The following example shows how to select columns 'Income', 'Eyes', and 'Hairs':

# Method 3: Name-Based Indexing
df_3 = df.loc[:, 'Income':'Hairs']
print(df_3)
'''
   Income  Eyes   Hairs
0  100000  blue  blonde
1   20000  blue   black
2  120000  grey    grey
'''

Alternatively, you can explicitly list all specific column names using a list as second (column) indexing arguments:

df_3 = df.loc[:, ['Income', 'Eyes', 'Hairs']]
print(df_3)
'''
   Income  Eyes   Hairs
0  100000  blue  blonde
1   20000  blue   black
2  120000  grey    grey
'''

Summary

There are three main ways to access multiple columns from a DataFrame:

  • Basic List-Based Indexing such as df[['A', 'B', 'C']] to access three columns 'A', 'B', and 'C'.
  • Integer-Based Indexing such as df[:, 1:3] to access the second and third columns using the rules of standard slicing.
  • Name-Based Indexing such as df.loc[:, 'A':'C'] to access three columns 'A', 'B', and 'C'.

Learn Pandas the Fun Way by Solving Code Puzzles

If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).

Coffee Break Pandas Book

It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?

Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.