[toc]
Problem: Given a CSV file, how to read only specific column(s) from the csv file? (Reading a specific column from a csv file will yield all the row values pertaining to that column.)
Example: Consier the following csv
file (countries.csv
):
Country,Capital,Population,Area Germany,Berlin,"84,267,549","348,560" France,Paris,"65,534,239","547,557" Spain,Madrid,"46,787,468","498,800" Italy,Rome,"60,301,346","294,140" India,Delhi,"1,404,495,187","2,973,190" USA,Washington,"334,506,463","9,147,420" China,Beijing,"1,449,357,022","9,388,211" Poland,Warsaw,"37,771,789","306,230" Russia,Moscow,"146,047,418","16,376,870" England,London,"68,529,747","241,930"
Question: How will you read the above csv file and display the following columns –
Country
column along with theCapital
column?- All values in the
population
column?
Method 1: Using Pandas
Using the Pandas library is probably the best option if you are dealing with csv files. You can easily read a csv file and store an entire column within a variable.
Code:
import pandas as pd df = pd.read_csv("countries.csv") country = df['Country'] # or # country = df.Country capital = df['Capital'] # or # capital = df.Capital # displaying selected columns (Country and Capital) for x, y in zip(country, capital): print(f"{x} {y}") # displaying a single column (Country) print() print(df['Population'])
Output:
Germany Berlin France Paris Spain Madrid Italy Rome India Delhi USA Washington China Beijing Poland Warsaw Russia Moscow England London 0 84,267,549 1 65,534,239 2 46,787,468 3 60,301,346 4 1,404,495,187 5 334,506,463 6 1,449,357,022 7 37,771,789 8 146,047,418 9 68,529,747 Name: Population, dtype: object
Explanation:
- Read the csv file using
pd.read_csv()
Pandas function. - Save all the information of the columns Country and Capital within independent variables using –
country = df['Country']
- Alternatively, you can also use
country = df.Country
- Alternatively, you can also use
capital = df['Capital']
- Alternatively, you can also use
capital = df.Capital
- Alternatively, you can also use
- To display the country names and their capitals simultaneously, you can bind the two columns,
country
andcapital
, using the zip() function and then display each country along with its capital using a for loop upon the zipped object. - To display all the values in the population column, you can simply use
df['Population']
.
TRIVIAzip()
is a built-in function in Python that takes an arbitrary number of iterables and binds them into a single iterable, a zip
object. It combines the n-th
value of each iterable argument into a tuple.
Read more about zip() here.
➤ List-Based Indexing of a DataFrame
In case you are not comfortable with using zip()
to display multiple columns at once, you have another option. You can simply use list-based indexing to accomplish your goal.
List-based indexing is a technique that allows you to pass multiple column names as a list within the square-bracket selector.
Example:
import pandas as pd df = pd.read_csv("countries.csv") print() print(df[['Country', 'Capital']])
Output:
Country Capital 0 Germany Berlin 1 France Paris 2 Spain Madrid 3 Italy Rome 4 India Delhi 5 USA Washington 6 China Beijing 7 Poland Warsaw 8 Russia Moscow 9 England London
Method 2: Integer Based Indexing with iloc
Approach: The idea here is to use the df.iloc[rows, columns].values
to access individual columns from the DataFrame using indexing. Note that the first column always has the index 0, while the second column has index 1, and so on.
rows
is used to select individual rows. Use the slicing colon:
to ensure all rows have been selected.columns
is used to select individual columns.- Use
country = data.iloc[:, 0].values
to save the values of the Country column. capital = data.iloc[:, 1].values
to save the values of the Capital column.population = data.iloc[:, 2].values
to save the values of the Population column.
- Use
import pandas as pd data = pd.read_csv('countries.csv') country = data.iloc[:, 0].values capital = data.iloc[:, 1].values population = data.iloc[:, 2].values # displaying selected columns print(data[['Country', 'Capital']]) print() # displaying a single column (Population) print(population)
Output:
Country Capital 0 Germany Berlin 1 France Paris 2 Spain Madrid 3 Italy Rome 4 India Delhi 5 USA Washington 6 China Beijing 7 Poland Warsaw 8 Russia Moscow 9 England London ['84,267,549' '65,534,239' '46,787,468' '60,301,346' '1,404,495,187' '334,506,463' '1,449,357,022' '37,771,789' '146,047,418' '68,529,747']
Method 3: Name-Based Indexing with loc()
Instead of selecting the columns by their index, you can also select them by their name using the df.loc[]
selecter.
The following example shows how to select the columns Country
and Capital
from the given DataFrame.
import pandas as pd data = pd.read_csv('countries.csv') val = data.loc[:, ['Country', 'Capital']] print(val)
Output:
Country Capital 0 Germany Berlin 1 France Paris 2 Spain Madrid 3 Italy Rome 4 India Delhi 5 USA Washington 6 China Beijing 7 Poland Warsaw 8 Russia Moscow 9 England London
Related Tutorial: Slicing Data from a Pandas DataFrame using .loc and .iloc
Method 4: Using csv Module
csv
module is yet another spectacular option in Python that allows you to play with csv files. Let us have a look at the code that helps us to read the given csv file and then read specific columns from it:
import csv population = [] with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for r in data: print("Country", ":", "Capital") # append values from population column to population list population.append(r['Population']) # displaying specific columns (Country and Capital) print(r['Country'], ":", r['Capital']) # display the population list print(population)
Output:
Country : Capital Germany : Berlin Country : Capital France : Paris Country : Capital Spain : Madrid Country : Capital Italy : Rome Country : Capital India : Delhi Country : Capital USA : Washington Country : Capital China : Beijing Country : Capital Poland : Warsaw Country : Capital Russia : Moscow Country : Capital England : London ['84,267,549', '65,534,239', '46,787,468', '60,301,346', '1,404,495,187', '334,506,463', '1,449,357,022', '37,771,789', '146,047,418', '68,529,747']
Explanation:
- Import the
csv
module and open up the csv file. Ensure that you feed in theencoding
argument as it helps to eliminate any unreadable characters that may occur in the given csv file.with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile
- Allow Python to read the csv file as a dictionary using
csv.Dictreader
object. - Once the file has been read in the form of a dictionary, you can easily fetch the values from respective columns by using the keys within square bracket notation from the dictionary. Here each column represents the key within the given dictionary.
Bonus: Here’s a quick look at how the DictReader()
class looks like:
import csv population = [] with open('countries.csv', newline='', encoding='utf-8-sig') as csvfile: data = csv.DictReader(csvfile) for row in data: print(row)
Output:
{'Country': 'Germany', 'Capital': 'Berlin', 'Population': '84,267,549', 'Area': '348,560'}
{'Country': 'France', 'Capital': 'Paris', 'Population': '65,534,239', 'Area': '547,557'}
{'Country': 'Spain', 'Capital': 'Madrid', 'Population': '46,787,468', 'Area': '498,800'}
{'Country': 'Italy', 'Capital': 'Rome', 'Population': '60,301,346', 'Area': '294,140'}
{'Country': 'India', 'Capital': 'Delhi', 'Population': '1,404,495,187', 'Area': '2,973,190'}
{'Country': 'USA', 'Capital': 'Washington', 'Population': '334,506,463', 'Area': '9,147,420'}
{'Country': 'China', 'Capital': 'Beijing', 'Population': '1,449,357,022', 'Area': '9,388,211'}
{'Country': 'Poland', 'Capital': 'Warsaw', 'Population': '37,771,789', 'Area': '306,230'}
{'Country': 'Russia', 'Capital': 'Moscow', 'Population': '146,047,418', 'Area': '16,376,870'}
{'Country': 'England', 'Capital': 'London', 'Population': '68,529,747', 'Area': '241,930'}
It is evident from the output that csv.DictReader()
returns a dictionary for each row such that the column header is the key while the value in the row is the associated value in the dictionary.
Conclusion
To sum things up, there are majorly four different ways of accessing specific columns from a given csv file:
- List-Based Indexing.
- Integer-Based Indexing.
- Name-Based Indexing.
- Using csv modules
DictReader
class.
Feel free to use the one that suits you best. I hope this tutorial helped you. Please subscribe and stay tuned for more interesting tutorials. Happy learning!
Learn Pandas the Fun Way by Solving Code Puzzles
If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).
It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?
Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.