6 min read
How to read data from a .csv
file and add its column or row to the list? There are three main ways:
- Option 1 (the quickest): use the standard library
- Option 2 (the most preferred): use
pandas.read_csv()
- Option 3 (optional): use
csv.reader()
Short answer
The simplest option to read a .csv
file into a list is to use it with open(“file”) as f:
and apply the actions you need. You should also remember that such an approach has its limitations, as you’ll see in the tutorial.
Prerequisites
To read the .csv
file, I used the following tools:
- Python 3.8
- PyCharm IDE for convenient coding experience
- Sublime Text Editor for a manual check of the
.csv
file
By default, you can read CSV files using other tools or even default, pre-installed programs you have on your machine, so it is just a matter of choice what tools to use. The codebase can be executed anywhere with the same results.
What is the CSV Format?
Nowadays, three main data formats are used for passing data from one machine to another: CSV, XML, and JSON.
The abbreviation CSV stands for “comma-separated values”. As the name implies, it is just a list of elements separated by commas. It is the most straightforward format to transfer data and should be used if
- you need the most compact file size, or
- you have a flat data structure.
Keep in mind that CSV files do not give you such flexibility in presenting the data as the other two options.
Related articles:
Example Task
This is a real-world task in a simplified form. The goal is to read data from CSV file (70 KB) and form a list of all series codes present in the second line of it.
The provided data is an open statistical data from the European Central Bank (ECB) in CSV format and present financial flows during the period. The file consist of three main fields:
- series code
- observed date (period, e.g., 2019Q4, 2020Q1, etc.)
- observed value (data point, float number)
Data Preparation
To focus on the parsing option, I suggest you download and extract a file beforehand. In the examples, the file will be placed on the Desktop, but you can put it anywhere you like.
Script:
import os import wget link = "http://sdw.ecb.europa.eu/export.do? mergeFilter=&removeItem=L&REF_AREA.252=I8&COUNTERPART_AREA.252=W0 &rc=&ec=&legendPub=published&oc=&df=true&DATASET=0&dc=&ACCOUNTING _ENTRY.252=A&node=9689710&showHide=&removedItemList=&pb=&legendNo r=&activeTab=&STO.252=F&STO.252=K7&STO.252=KA&STO.252=LE&legendRe f=reference&REF_SECTOR.252=S1&exportType=csv&ajaxTab=true" path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" wget.download(link, path)
Script breakdown:
import os import wget
Import statements are used to install code base which was written by someone else before and is ready to use just by referring to it. Some them (e.g. wget
) should be additionally installed using similar command:
The following command will install the latest version of a module and its dependencies from the Python Packaging Index:
python -m pip install SomePackage
os
package is used to perform basic operation with files and folders in your operating system.
wget
package is used to download files from websites.
link = "http://sdw.ecb.europa.eu/export.do? mergeFilter=&removeItem=L&REF_AREA.252=I8&COUNTERPART_AREA.252=W0 &rc=&ec=&legendPub=published&oc=&df=true&DATASET=0&dc=&ACCOUNTING _ENTRY.252=A&node=9689710&showHide=&removedItemList=&pb=&legendNo r=&activeTab=&STO.252=F&STO.252=K7&STO.252=KA&STO.252=LE&legendRe f=reference&REF_SECTOR.252=S1&exportType=csv&ajaxTab=true"
The string variable link
is created which represents a direct download link. This link can be easily tested in any web-browser.
path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv"
string variable path
is created which represents a path in your system where files will be downloaded later.
The prefix “f
” before the string makes it an “f-string” which means that you can use other variables in the string by using {placeholders}
. In this case, variable os.environ[‘HOMEPATH’]
refers to system variable (declared in the Windows system by default, not in your python script) and puts it into a string we just created. By default, HOMEPATH
refers to the current user by C:\Users\%user%
(you).
wget.download(link, path)
The function call wget.download()
triggers the file download from previously specified link and saves it by previously specified path.
The result of this step is a ready-to-use CSV file on your Desktop. Now we can parse data from CSV file and extract series codes to list.
Data Exploration
It is a good practice to explore data before you start parsing it. In this case, you can see that series codes are present in the second row of data.csv
.
Option 1 (the Fastest): Use the Standard Library
This is the fastest option of reading a file using the standard library. Assuming the file is prepared and located on your Desktop, you can use the script below. This is the easiest way of getting data on the list. However, it has its drawbacks.
Input:
import os path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" with open(path, "r") as f: print(list(f.readlines()[1].split(","))[1:])
Output:
[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’]
Script breakdown:
with open(path, "r") as f: print(list(f.readlines()[1].split(","))[1:])
The import statement and variable assignment is skipped as it was described previously and the main attention is given to the last statements.
This is a combined statement of three parts:
- The “with” statement, in the general meaning, allows us to define what code block (actions) we want to do with the object while it is “active”. In this case, we want to tell python that it has to do some actions while the file is open, and when all statements are completed, close it.
- The “open” statement allows us to open a file and place it is into Python memory. In this case, we open the previously given file (“path” variable) in “r” mode, which stands for “read”-only mode.
- The “print” statement allows you to see output on your screen. In this case we
- take file object
f
withopen(path, 'r') as f
, - read second line with
f.readlines()[1]
, - split the line by the
,
separator inf.readlines()[1].split(“,”)
, - convert the to list
list(f.readlines()[1].split(“,”))
, - return the list starting from second element as the first one is empty in
list(f.readlines()[1].split(“,”))[1:]
, and - print the result in
print(list(f.readlines()[1].split(“,”))[1:])
.
- take file object
There is no specific documentation as this code base uses standard library which is built-in in Python.
Pros/Cons: Such an approach allows the user to get an instant view of the CSV file and select the required data. You can use this for spot checks and simple transformations. It is important to remember that such an approach has the lowest amount of adjustable settings, and it requires lots of workarounds when transformations are complicated.
Option 2 (the Most Preferred): Use pandas.read_csv()
The most preferred option of reading .csv
file is using the Pandas library (Pandas cheat sheets here). Pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool, built on top of the Python programming language.
Pandas is usually used for more advanced data analysis where data is stored in a “DataFrame” which is basically like a table in excel. A DataFrame has a header row and an index column so that you can refer to table values by “column x row” intersection.
Script:
import os import pandas as pd path = f”C:{os.environ[‘HOMEPATH’]}\\Desktop\\data.csv” df = pd.read_csv(path, delimiter=”,”, skiprows=[0]) list = df.columns.to_list()[1:] print(list)
Output:
[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’]
Script breakdown:
df = pd.read_csv(path, delimiter=”,”, skiprows=[0])
In this dataframe, variable df
is created from the .csv
file by executing pandas method read_csv
. In this case the method requires several arguments: file
, delimiter
and skiprows
.
The file is the same as used before. The delimiter is “,” which is a default option for .csv files, and it might be skipped. But it’s good to know that you can use any other delimiter.
list = df.columns.to_list()[1:] print(list)
This line selects the column headers and puts them into a list starting from the second element going forward. The result is printed.
Pros/Cons: Such an approach is relatively fast, visually appealing to the reader, and is fully adjustable using a consistent approach. Comparing to the first option when standard libraries are used, it requires additional packages to be installed. I personally believe that it is not a problem, and such a drawback can be neglected. But another point should not be skipped — the amount of data. This approach is inefficient when you need lots of “side” data, which is useless for your purpose.
Full documentation is available here with more guides and instructions on how to use it.
Option 3 (optional): use csv.reader()
There is also another way how to read .csv
files, which might be useful in certain circumstances. The csv
module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.
Script:
import os import csv path = f"C:{os.environ['HOMEPATH']}\\Desktop\\data.csv" with open(path, 'r') as f: wines = list(csv.reader(f, delimiter=","))[1][1:]
Output:
[‘QSA.Q.N.I8.W0.S1.S1.N.A.F.F._Z._Z.XDC._T.S.V.N._T’,... ‘QSA.Q.N.I8.W0.S1.S1.N.A.LE.F89.T._Z.XDC._T.S.V.N._T’]
Script breakdown:
with open(path, 'r') as f: wines = list(csv.reader(f, delimiter=","))[1][1:]
csv.reader()
is a method which allows you to parse .csv
file with specified delimiter.
After that, we select the second row using first brackets “[1]
” and after that, select all elements from that list starting from second “[1:]
” using slicing.
As this is a standard package, there is documentation at the official python website:
Pros/Cons: Such an approach is relatively simple and has just a few lines of code. On the other hand, it requires an additional package to be installed.
Summary
You should remember that there are different ways of reading data from CSV files. Select the one which suits your needs most or has the best performance and runtime.
Where to Go From Here?
Enough theory. Let’s get some practice!
Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.
To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?
You build high-value coding skills by working on practical coding projects!
Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?
🚀 If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.
If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.