How to Read Specific Columns from CSV File in Python

[toc]

A Quick Glance at The Solutions [Each solution stays for 5-10 secs.]

Problem: Given a CSV file, how to read only specific column(s) from the csv file? (Reading a specific column from a csv file will yield all the row values pertaining to that column.)

Example: Consier the following csv file (countries.csv):

Country,Capital,Population,Area
Germany,Berlin,"84,267,549","348,560"
France,Paris,"65,534,239","547,557"
Spain,Madrid,"46,787,468","498,800"
Italy,Rome,"60,301,346","294,140"
India,Delhi,"1,404,495,187","2,973,190"
USA,Washington,"334,506,463","9,147,420"
China,Beijing,"1,449,357,022","9,388,211"
Poland,Warsaw,"37,771,789","306,230"
Russia,Moscow,"146,047,418","16,376,870"
England,London,"68,529,747","241,930"

Question: How will you read the above csv file and display the following columns –

  1. Country column along with the Capital column?
  2. All values in the population column?

Method 1: Using Pandas

Using the Pandas library is probably the best option if you are dealing with csv files. You can easily read a csv file and store an entire column within a variable.

Code:

import pandas as pd

df = pd.read_csv("countries.csv")
country = df['Country']
# or
# country = df.Country
capital = df['Capital']
# or
# capital = df.Capital

# displaying selected columns (Country and Capital)
for x, y in zip(country, capital):
    print(f"{x}       {y}")
    
# displaying a single column (Country)
print()
print(df['Population'])

Output:

Germany       Berlin
France       Paris
Spain       Madrid
Italy       Rome
India       Delhi
USA       Washington
China       Beijing
Poland       Warsaw
Russia       Moscow
England       London

0       84,267,549
1       65,534,239
2       46,787,468
3       60,301,346
4    1,404,495,187
5      334,506,463
6    1,449,357,022
7       37,771,789
8      146,047,418
9       68,529,747
Name: Population, dtype: object

Explanation:

  • Read the csv file using pd.read_csv() Pandas function.
  • Save all the information of the columns Country and Capital within independent variables using –
    • country = df['Country']
      • Alternatively, you can also use country = df.Country
    • capital = df['Capital']
      • Alternatively, you can also use capital = df.Capital
  • To display the country names and their capitals simultaneously, you can bind the two columns, country and capital, using the zip() function and then display each country along with its capital using a for loop upon the zipped object.
  • To display all the values in the population column, you can simply use df['Population'].

TRIVIA
zip() is a built-in function in Python that takes an arbitrary number of iterables and binds them into a single iterable, a zip object. It combines the n-th value of each iterable argument into a tuple.
Read more about zip() here.

List-Based Indexing of a DataFrame

In case you are not comfortable with using zip() to display multiple columns at once, you have another option. You can simply use list-based indexing to accomplish your goal.

List-based indexing is a technique that allows you to pass multiple column names as a list within the square-bracket selector.

Example:

import pandas as pd

df = pd.read_csv("countries.csv")
print()
print(df[['Country', 'Capital']])

Output:

   Country     Capital
0  Germany      Berlin
1   France       Paris
2    Spain      Madrid
3    Italy        Rome
4    India       Delhi
5      USA  Washington
6    China     Beijing
7   Poland      Warsaw
8   Russia      Moscow
9  England      London

Method 2: Integer Based Indexing with iloc

Approach: The idea here is to use the df.iloc[rows, columns].values to access individual columns from the DataFrame using indexing. Note that the first column always has the index 0, while the second column has index 1, and so on.

  • rows is used to select individual rows. Use the slicing colon: to ensure all rows have been selected.
  • columns is used to select individual columns.
    • Use country = data.iloc[:, 0].values to save the values of the Country column.
    • capital = data.iloc[:, 1].values to save the values of the Capital column.
    • population = data.iloc[:, 2].values to save the values of the Population column.
import pandas as pd

data = pd.read_csv('countries.csv')
country = data.iloc[:, 0].values
capital = data.iloc[:, 1].values
population = data.iloc[:, 2].values
# displaying selected columns
print(data[['Country', 'Capital']])
print()
# displaying a single column (Population)
print(population)

Output:

Country     Capital
0  Germany      Berlin
1   France       Paris
2    Spain      Madrid
3    Italy        Rome
4    India       Delhi
5      USA  Washington
6    China     Beijing
7   Poland      Warsaw
8   Russia      Moscow
9  England      London

['84,267,549' '65,534,239' '46,787,468' '60,301,346' '1,404,495,187'
 '334,506,463' '1,449,357,022' '37,771,789' '146,047,418' '68,529,747']

Method 3: Name-Based Indexing with loc()

Instead of selecting the columns by their index, you can also select them by their name using the df.loc[] selecter.

The following example shows how to select the columns Country and Capital from the given DataFrame.

import pandas as pd

data = pd.read_csv('countries.csv')
val = data.loc[:, ['Country', 'Capital']]
print(val)

Output:

Country     Capital
0  Germany      Berlin
1   France       Paris
2    Spain      Madrid
3    Italy        Rome
4    India       Delhi
5      USA  Washington
6    China     Beijing
7   Poland      Warsaw
8   Russia      Moscow
9  England      London

Related Tutorial: Slicing Data from a Pandas DataFrame using .loc and .iloc

Method 4: Using csv Module

csv module is yet another spectacular option in Python that allows you to play with csv files. Let us have a look at the code that helps us to read the given csv file and then read specific columns from it:

import csv

population = []
with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile:
    data = csv.DictReader(csvfile)
    for r in data:
        print("Country", ":", "Capital")
        # append values from population column to population list
        population.append(r['Population'])
        # displaying specific columns (Country and Capital)
        print(r['Country'], ":", r['Capital'])
    # display the population list
    print(population)

Output:

Country : Capital
Germany : Berlin
Country : Capital
France : Paris
Country : Capital
Spain : Madrid
Country : Capital
Italy : Rome
Country : Capital
India : Delhi
Country : Capital
USA : Washington
Country : Capital
China : Beijing
Country : Capital
Poland : Warsaw
Country : Capital
Russia : Moscow
Country : Capital
England : London
['84,267,549', '65,534,239', '46,787,468', '60,301,346', '1,404,495,187', '334,506,463', '1,449,357,022', '37,771,789', '146,047,418', '68,529,747']

Explanation:

  • Import the csv module and open up the csv file. Ensure that you feed in the encoding argument as it helps to eliminate any unreadable characters that may occur in the given csv file.
    • with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile
  • Allow Python to read the csv file as a dictionary using csv.Dictreader object.
  • Once the file has been read in the form of a dictionary, you can easily fetch the values from respective columns by using the keys within square bracket notation from the dictionary. Here each column represents the key within the given dictionary.

Bonus: Here’s a quick look at how the DictReader() class looks like:

import csv

population = []
with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile:
    data = csv.DictReader(csvfile)
    for row in data:
        print(row)

Output:

{'Country': 'Germany', 'Capital': 'Berlin', 'Population': '84,267,549', 'Area': '348,560'}
{'Country': 'France', 'Capital': 'Paris', 'Population': '65,534,239', 'Area': '547,557'}
{'Country': 'Spain', 'Capital': 'Madrid', 'Population': '46,787,468', 'Area': '498,800'}
{'Country': 'Italy', 'Capital': 'Rome', 'Population': '60,301,346', 'Area': '294,140'}
{'Country': 'India', 'Capital': 'Delhi', 'Population': '1,404,495,187', 'Area': '2,973,190'}
{'Country': 'USA', 'Capital': 'Washington', 'Population': '334,506,463', 'Area': '9,147,420'}
{'Country': 'China', 'Capital': 'Beijing', 'Population': '1,449,357,022', 'Area': '9,388,211'}
{'Country': 'Poland', 'Capital': 'Warsaw', 'Population': '37,771,789', 'Area': '306,230'}
{'Country': 'Russia', 'Capital': 'Moscow', 'Population': '146,047,418', 'Area': '16,376,870'}
{'Country': 'England', 'Capital': 'London', 'Population': '68,529,747', 'Area': '241,930'}

It is evident from the output that csv.DictReader() returns a dictionary for each row such that the column header is the key while the value in the row is the associated value in the dictionary.

Conclusion

To sum things up, there are majorly four different ways of accessing specific columns from a given csv file:

  • List-Based Indexing.
  • Integer-Based Indexing.
  • Name-Based Indexing.
  • Using csv modules DictReader class.

Feel free to use the one that suits you best. I hope this tutorial helped you. Please subscribe and stay tuned for more interesting tutorials. Happy learning!


Learn Pandas the Fun Way by Solving Code Puzzles

If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).

Coffee Break Pandas Book

It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?

Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.