Python Pandas Input/Output – Flat File - Be on the Right Side of Change

Over your career as a Pythonista, there may be instances where you will work with Flat Files. This file type is an ASCII character-based file, usually with commas (,) separating the fields. Other common field separators are the following:

Semi-colon (;)
Tab character (\t)
Colon (:) and so on.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.

Preparation

Before any data manipulation can occur, one (1) new library will require installation.

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to install Pandas on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

Read CSV File

The flat file below is used for Section 2 and Section 3 of this article. Copy these lines and save them to a file called classics.txt. Place this file in the current working directory.

💡 Note: The field separator character in this file is a semi-colon (;).

UPC;Title;Price;Inventory
abbb492978ff656d;The Secret Garden;15.08;274
93379e3a2072a01b;The Metamorphosis;28.59;31
2798974abc8a58a8;Candide;58.63;11
2e69730561ed70ad;Emma;32.93;97
39592d9d72e717c4;Of Mice and Men;47.11;18

With the classics.txt file saved to the current working directory, the code below reads in the flat file and sends the contents to a DataFrame.

The sep parameter must exist in this instance. By default, the comma (,) separator is assumed.

df = pd.read_csv('classics.txt', sep=';', encoding='utf-8')
print(df)

Line [1] reads in the text file and parses the fields using the semi-colon (;) separator. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring. The data is then saved to a DataFrame (df).
Line [2] outputs the DataFrame to the terminal window.

💡 Note: A UnicodeEncodeError occurs when a flat-file contains ‘special’ characters, such as characters outside the ASCII range. Click here to view a chart of these characters.

Output

	UPC	Title	Price	Inventory
0	abbb492978ff656d	The Secret Garden	15.08	274
1	93379e3a2072a01b	The Metamorphosis	28.59	31
2	2798974abc8a58a8	Candide	58.63	11
3	2e69730561ed70ad	Emma	32.93	97
4	39592d9d72e717c4	Of Mice and Men	47.11	18

DataFrame to CSV

Expanding on the code above, let’s add an additional line to save the DataFrame (df) to a CSV file.

df.to_csv('classics.csv', index=False, encoding='utf-8')
print(df)

Line [1] passes index=False to remove the left-hand column numbers (see above). Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring.
Line [2] outputs the DataFrame to the terminal window.

Output

UPC	Title	Price	Inventory
abbb492978ff656d	The Secret Garden	15.08	274
93379e3a2072a01b	The Metamorphosis	28.59	31
2798974abc8a58a8	Candide	58.63	11
2e69730561ed70ad	Emma	32.93	97
39592d9d72e717c4	Of Mice and Men	47.11	18

Read Table

For this example, create a new text file fiction.txt.

Use the following data for this file. Save and place this file in the current working directory.

💡 Note: The separator here is the whitespace parameter. Set your file up in the same format as below.

💡 Note: The drawback is if any data in any column contains a space, for example, ‘Grey Life,’ an error will occur.

df = pd.read_table('fiction.txt',  delim_whitespace=True, index_col=0, encoding='utf-8')
print(df)

Line [1] reads in the text file, sets the field separator (delimiter) to whitespace, and sets the index to column 0. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring.
Line [2] outputs the DataFrame to the terminal.

Output

	Title	Price	Inventory
UPC
3c456328b04a8ee8	Grey	48.49	23
bade9943ee01b63f	Paris	17.28	4
9546d537fbf99eb6	Dreaming	20.55	13
a40723994f715420	Houdini	30.25	7
41fc5dce044f16f5	Girl-Blue	46.83	34

To save this table as a DataFrame, run the code below.

df.to_csv('fiction.csv', index=True, encoding='utf-8')

Read FWF

FWF stands for Fixed Width Fields. The read_fwf() function reads a table of fixed-width formatted lines into a DataFrame.

For this example, create a new text file authors.txt.

Use the following data for this file. Place this file in the current working directory.

fwidths = [
    9,  # Title
    19, # Author
    6  # Price
    ]

df = pd.read_fwf('authors.txt', widths=fwidths)
print(df)

Line [1] sets the width for each column in authors.txt.
Line [2] reads in authors.txt and sets the widths of each column to their corresponding item in the widths list.
Line [3] outputs the DataFrame to the terminal.

Output

	Title	Author	Price
0	Grey	Steve Smith	20.88
1	Paris	Audrey Cohill	23.67
2	Dreaming	Alex Balfour	10.99
3	Houdini	Paula Greaves	25.66

🌍 Related Tutorial: Python Convert Fixed Width File to CSV