Over your career as a Pythonista, there may be instances where you will work with Flat Files. This file type is an ASCII character-based file, usually with commas (,) separating the fields. Other common field separators are the following:
- Semi-colon (
;
) - Tab character (
\t
) - Colon (
:
) and so on.
This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.
Preparation
Before any data manipulation can occur, one (1) new library will require installation.
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
Read CSV File
The flat file below is used for Section 2 and Section 3 of this article. Copy these lines and save them to a file called classics.txt
. Place this file in the current working directory.
π‘ Note: The field separator character in this file is a semi-colon (;
).
UPC;Title;Price;Inventory abbb492978ff656d;The Secret Garden;15.08;274 93379e3a2072a01b;The Metamorphosis;28.59;31 2798974abc8a58a8;Candide;58.63;11 2e69730561ed70ad;Emma;32.93;97 39592d9d72e717c4;Of Mice and Men;47.11;18
With the classics.txt
file saved to the current working directory, the code below reads in the flat file and sends the contents to a DataFrame.
The sep
parameter must exist in this instance. By default, the comma (,
) separator is assumed.
df = pd.read_csv('classics.txt', sep=';', encoding='utf-8') print(df)
- Line [1] reads in the text file and parses the fields using the semi-colon (
;
) separator. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring. The data is then saved to a DataFrame (df
). - Line [2] outputs the DataFrame to the terminal window.
π‘ Note: A UnicodeEncodeError
occurs when a flat-file contains ‘special’ characters, such as characters outside the ASCII range. Click here to view a chart of these characters.
Output
UPC | Title | Price | Inventory | |
0 | abbb492978ff656d | The Secret Garden | 15.08 | 274 |
1 | 93379e3a2072a01b | The Metamorphosis | 28.59 | 31 |
2 | 2798974abc8a58a8 | Candide | 58.63 | 11 |
3 | 2e69730561ed70ad | Emma | 32.93 | 97 |
4 | 39592d9d72e717c4 | Of Mice and Men | 47.11 | 18 |
DataFrame to CSV
Expanding on the code above, let’s add an additional line to save the DataFrame (df
) to a CSV file.
df.to_csv('classics.csv', index=False, encoding='utf-8') print(df)
- Line [1] passes
index=False
to remove the left-hand column numbers (see above). Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring. - Line [2] outputs the DataFrame to the terminal window.
Output
UPC | Title | Price | Inventory |
abbb492978ff656d | The Secret Garden | 15.08 | 274 |
93379e3a2072a01b | The Metamorphosis | 28.59 | 31 |
2798974abc8a58a8 | Candide | 58.63 | 11 |
2e69730561ed70ad | Emma | 32.93 | 97 |
39592d9d72e717c4 | Of Mice and Men | 47.11 | 18 |
Read Table
For this example, create a new text file fiction.txt
.
Use the following data for this file. Save and place this file in the current working directory.
π‘ Note: The separator here is the whitespace parameter. Set your file up in the same format as below.

π‘ Note: The drawback is if any data in any column contains a space, for example, ‘Grey Life,’ an error will occur.
df = pd.read_table('fiction.txt', delim_whitespace=True, index_col=0, encoding='utf-8') print(df)
- Line [1] reads in the text file, sets the field separator (delimiter) to whitespace, and sets the index to column 0. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring.
- Line [2] outputs the DataFrame to the terminal.
Output
Title | Price | Inventory | |
UPC | |||
3c456328b04a8ee8 | Grey | 48.49 | 23 |
bade9943ee01b63f | Paris | 17.28 | 4 |
9546d537fbf99eb6 | Dreaming | 20.55 | 13 |
a40723994f715420 | Houdini | 30.25 | 7 |
41fc5dce044f16f5 | Girl-Blue | 46.83 | 34 |
To save this table as a DataFrame, run the code below.
df.to_csv('fiction.csv', index=True, encoding='utf-8')
Read FWF
FWF stands for Fixed Width Fields. The read_fwf()
function reads a table of fixed-width formatted lines into a DataFrame.
For this example, create a new text file authors.txt
.
Use the following data for this file. Place this file in the current working directory.

fwidths = [ 9, # Title 19, # Author 6 # Price ] df = pd.read_fwf('authors.txt', widths=fwidths) print(df)
- Line [1] sets the width for each column in
authors.txt
. - Line [2] reads in
authors.txt
and sets the widths of each column to their corresponding item in the widths list. - Line [3] outputs the DataFrame to the terminal.
Output
Title | Author | Price | |
0 | Grey | Steve Smith | 20.88 |
1 | Paris | Audrey Cohill | 23.67 |
2 | Dreaming | Alex Balfour | 10.99 |
3 | Houdini | Paula Greaves | 25.66 |
π Related Tutorial: Python Convert Fixed Width File to CSV