The easiest way to select multiple columns in Pandas is to pass a list into the standard square-bracket indexing scheme. For example, the expression df[['Col_1', 'Col_4, 'Col_7']] would access columns 'Col_1', 'Col_4', and 'Col_7'. This is the most flexible and concise way for only a couple of columns.
To learn about the best 3 ways to accomplish this in alternative scenarios, read on!
β₯οΈ Info: Are you AI curious but you still have to create real impactful projects? Join our official AI builder club on Skool (only $5): SHIP! - One Project Per Month
Problem Formulation
Say, you create the following Pandas DataFrame:
import pandas as pd
# Create DataFrame
data = [['Alice', 24, 168, 100000, 'blue', 'blonde'],
['Bob', 37, 164, 20000, 'blue', 'black'],
['Carl', 18, 201, 120000, 'grey', 'grey']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Height',
'Income', 'Eyes', 'Hairs'])It looks like this:
print(df)
'''
Name Age Height Income Eyes Hairs
0 Alice 24 168 100000 blue blonde
1 Bob 37 164 20000 blue black
2 Carl 18 201 120000 grey grey
'''Problem: How to select multiple columns from this DataFrame?
For example, how to select columns ['Name', 'Income', 'Eyes', 'Hairs'] from the DataFrame to obtain the following DataFrame?
Method 1: Basic List-Based Indexing
List-based indexing in Pandas allows you to pass multiple column names as a list into the square-bracket selector. For example, df[['A', 'B', 'C']] would select columns 'A', 'B', and 'C' of the DataFrame df. The resulting DataFrame has the columns in the order of the passed list.
# Original DataFrame:
'''
Name Age Height Income Eyes Hairs
0 Alice 24 168 100000 blue blonde
1 Bob 37 164 20000 blue black
2 Carl 18 201 120000 grey grey
'''Here’s how you’d select columns ['Name', 'Income', 'Eyes', 'Hairs'] from the DataFrame in the problem formulation:
# Method 1: List-Based Indexing
df_1 = df[['Name', 'Income', 'Eyes', 'Hairs']]
print(df_1)
'''
Name Income Eyes Hairs
0 Alice 100000 blue blonde
1 Bob 20000 blue black
2 Carl 120000 grey grey
'''The order of the columns matters. If you reverse them, you get the following DataFrame with columns in reversed order:
df_1 = df[['Hairs', 'Eyes', 'Income', 'Name']]
print(df_1)
'''
Hairs Eyes Income Name
0 blonde blue 100000 Alice
1 black blue 20000 Bob
2 grey grey 120000 Carl
'''Method 2: Integer-Based Indexing
You can use the df.iloc[rows, columns] syntax to access individual columns using zero-based indexing with the first column having index 0, the second index 1, and so on.
rowsselects individual rows—you can use the slicing colon:to indicate that all rows should be selected.columnsselects individual columns, you can pass a list of column identifiers as integers.
# Original DataFrame:
'''
Name Age Height Income Eyes Hairs
0 Alice 24 168 100000 blue blonde
1 Bob 37 164 20000 blue black
2 Carl 18 201 120000 grey grey
'''Here’s an example on the DataFrame from the problem formulation:
df_2 = df.iloc[:, [0, 3, 4, 5]]
print(df_2)
'''
Name Income Eyes Hairs
0 Alice 100000 blue blonde
1 Bob 20000 blue black
2 Carl 120000 grey grey
'''You can also use slicing as an alternative to the explicit list-based argument to access multiple columns—this is one of the main advantages of using df.iloc[...] after all!
df_2 = df.iloc[:, 3:6] print(df_2) ''' Income Eyes Hairs 0 100000 blue blonde 1 20000 blue black 2 120000 grey grey '''
In the example, the start index is 3 (included) and the stop index is 6 (excluded). So, all columns with identifiers 3, 4, and 5 are added to the final DataFrame.
Method 3: Name-Based Indexing
To select multiple columns by name, you can also use the df.loc[...] selector. It allows you to use slicing on column names instead of integer identifiers which can be more comfortable.
Let’s quickly recap the original DataFrame:
# Original DataFrame:
'''
Name Age Height Income Eyes Hairs
0 Alice 24 168 100000 blue blonde
1 Bob 37 164 20000 blue black
2 Carl 18 201 120000 grey grey
'''The following example shows how to select columns 'Income', 'Eyes', and 'Hairs':
# Method 3: Name-Based Indexing df_3 = df.loc[:, 'Income':'Hairs'] print(df_3) ''' Income Eyes Hairs 0 100000 blue blonde 1 20000 blue black 2 120000 grey grey '''
Alternatively, you can explicitly list all specific column names using a list as second (column) indexing arguments:
df_3 = df.loc[:, ['Income', 'Eyes', 'Hairs']] print(df_3) ''' Income Eyes Hairs 0 100000 blue blonde 1 20000 blue black 2 120000 grey grey '''
Summary
There are three main ways to access multiple columns from a DataFrame:
- Basic List-Based Indexing such as
df[['A', 'B', 'C']]to access three columns'A','B', and'C'. - Integer-Based Indexing such as
df[:, 1:3]to access the second and third columns using the rules of standard slicing. - Name-Based Indexing such as
df.loc[:, 'A':'C']to access three columns'A','B', and'C'.
Learn Pandas the Fun Way by Solving Code Puzzles
If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).
It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?
Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.
