The easiest way to select multiple columns in Pandas is to pass a list into the standard square-bracket indexing scheme. For example, the expression df[['Col_1', 'Col_4, 'Col_7']]
would access columns 'Col_1'
, 'Col_4'
, and 'Col_7'
. This is the most flexible and concise way for only a couple of columns.
To learn about the best 3 ways to accomplish this in alternative scenarios, read on!
Problem Formulation
Say, you create the following Pandas DataFrame:
import pandas as pd # Create DataFrame data = [['Alice', 24, 168, 100000, 'blue', 'blonde'], ['Bob', 37, 164, 20000, 'blue', 'black'], ['Carl', 18, 201, 120000, 'grey', 'grey']] df = pd.DataFrame(data, columns = ['Name', 'Age', 'Height', 'Income', 'Eyes', 'Hairs'])
It looks like this:
print(df) ''' Name Age Height Income Eyes Hairs 0 Alice 24 168 100000 blue blonde 1 Bob 37 164 20000 blue black 2 Carl 18 201 120000 grey grey '''
Problem: How to select multiple columns from this DataFrame?
For example, how to select columns ['Name', 'Income', 'Eyes', 'Hairs']
from the DataFrame to obtain the following DataFrame?
Method 1: Basic List-Based Indexing
List-based indexing in Pandas allows you to pass multiple column names as a list into the square-bracket selector. For example, df[['A', 'B', 'C']]
would select columns 'A'
, 'B'
, and 'C'
of the DataFrame df
. The resulting DataFrame has the columns in the order of the passed list.
# Original DataFrame: ''' Name Age Height Income Eyes Hairs 0 Alice 24 168 100000 blue blonde 1 Bob 37 164 20000 blue black 2 Carl 18 201 120000 grey grey '''
Here’s how you’d select columns ['Name', 'Income', 'Eyes', 'Hairs']
from the DataFrame in the problem formulation:
# Method 1: List-Based Indexing df_1 = df[['Name', 'Income', 'Eyes', 'Hairs']] print(df_1) ''' Name Income Eyes Hairs 0 Alice 100000 blue blonde 1 Bob 20000 blue black 2 Carl 120000 grey grey '''
The order of the columns matters. If you reverse them, you get the following DataFrame with columns in reversed order:
df_1 = df[['Hairs', 'Eyes', 'Income', 'Name']] print(df_1) ''' Hairs Eyes Income Name 0 blonde blue 100000 Alice 1 black blue 20000 Bob 2 grey grey 120000 Carl '''
Method 2: Integer-Based Indexing
You can use the df.iloc[rows, columns]
syntax to access individual columns using zero-based indexing with the first column having index 0, the second index 1, and so on.
rows
selects individual rows—you can use the slicing colon:
to indicate that all rows should be selected.columns
selects individual columns, you can pass a list of column identifiers as integers.
# Original DataFrame: ''' Name Age Height Income Eyes Hairs 0 Alice 24 168 100000 blue blonde 1 Bob 37 164 20000 blue black 2 Carl 18 201 120000 grey grey '''
Here’s an example on the DataFrame from the problem formulation:
df_2 = df.iloc[:, [0, 3, 4, 5]] print(df_2) ''' Name Income Eyes Hairs 0 Alice 100000 blue blonde 1 Bob 20000 blue black 2 Carl 120000 grey grey '''
You can also use slicing as an alternative to the explicit list-based argument to access multiple columns—this is one of the main advantages of using df.iloc[...]
after all!
df_2 = df.iloc[:, 3:6] print(df_2) ''' Income Eyes Hairs 0 100000 blue blonde 1 20000 blue black 2 120000 grey grey '''
In the example, the start index is 3 (included) and the stop index is 6 (excluded). So, all columns with identifiers 3, 4, and 5 are added to the final DataFrame.
Method 3: Name-Based Indexing
To select multiple columns by name, you can also use the df.loc[...]
selector. It allows you to use slicing on column names instead of integer identifiers which can be more comfortable.
Let’s quickly recap the original DataFrame:
# Original DataFrame: ''' Name Age Height Income Eyes Hairs 0 Alice 24 168 100000 blue blonde 1 Bob 37 164 20000 blue black 2 Carl 18 201 120000 grey grey '''
The following example shows how to select columns 'Income'
, 'Eyes'
, and 'Hairs'
:
# Method 3: Name-Based Indexing df_3 = df.loc[:, 'Income':'Hairs'] print(df_3) ''' Income Eyes Hairs 0 100000 blue blonde 1 20000 blue black 2 120000 grey grey '''
Alternatively, you can explicitly list all specific column names using a list as second (column) indexing arguments:
df_3 = df.loc[:, ['Income', 'Eyes', 'Hairs']] print(df_3) ''' Income Eyes Hairs 0 100000 blue blonde 1 20000 blue black 2 120000 grey grey '''
Summary
There are three main ways to access multiple columns from a DataFrame:
- Basic List-Based Indexing such as
df[['A', 'B', 'C']]
to access three columns'A'
,'B'
, and'C'
. - Integer-Based Indexing such as
df[:, 1:3]
to access the second and third columns using the rules of standard slicing. - Name-Based Indexing such as
df.loc[:, 'A':'C']
to access three columns'A'
,'B'
, and'C'
.
Learn Pandas the Fun Way by Solving Code Puzzles
If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).
It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?
Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.