Python Pandas Input/Output – Flat File

Over your career as a Pythonista, there may be instances where you will work with Flat Files. This file type is an ASCII character-based file, usually with commas (,) separating the fields. Other common field separators are the following:

  • Semi-colon (;)
  • Tab character (\t)
  • Colon (:) and so on.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd 

Read CSV File

The flat file below is used for Section 2 and Section 3 of this article. Copy these lines and save them to a file called classics.txt. Place this file in the current working directory.

πŸ’‘ Note: The field separator character in this file is a semi-colon (;).

UPC;Title;Price;Inventory
abbb492978ff656d;The Secret Garden;15.08;274
93379e3a2072a01b;The Metamorphosis;28.59;31
2798974abc8a58a8;Candide;58.63;11
2e69730561ed70ad;Emma;32.93;97
39592d9d72e717c4;Of Mice and Men;47.11;18

With the classics.txt file saved to the current working directory, the code below reads in the flat file and sends the contents to a DataFrame.

The sep parameter must exist in this instance. By default, the comma (,) separator is assumed.

df = pd.read_csv('classics.txt', sep=';', encoding='utf-8')
print(df)
  • Line [1] reads in the text file and parses the fields using the semi-colon (;) separator. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring. The data is then saved to a DataFrame (df).
  • Line [2] outputs the DataFrame to the terminal window.

πŸ’‘ Note: A UnicodeEncodeError occurs when a flat-file contains ‘special’ characters, such as characters outside the ASCII range. Click here to view a  chart of these characters.

Output

 UPCTitlePriceInventory
0abbb492978ff656d The Secret Garden 15.08274
193379e3a2072a01b The Metamorphosis 28.5931
22798974abc8a58a8           Candide 58.6311
32e69730561ed70ad              Emma 32.9397
439592d9d72e717c4   Of Mice and Men 47.1118

DataFrame to CSV

Expanding on the code above, let’s add an additional line to save the DataFrame (df) to a CSV file.

df.to_csv('classics.csv', index=False, encoding='utf-8')
print(df)
  • Line [1] passes index=False to remove the left-hand column numbers (see above). Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring.
  • Line [2] outputs the DataFrame to the terminal window.

Output

UPCTitlePriceInventory
abbb492978ff656d The Secret Garden 15.08274
93379e3a2072a01b The Metamorphosis 28.5931
2798974abc8a58a8           Candide 58.6311
2e69730561ed70ad              Emma 32.9397
39592d9d72e717c4   Of Mice and Men 47.1118

Read Table

For this example, create a new text file fiction.txt.

Use the following data for this file. Save and place this file in the current working directory.

πŸ’‘ Note: The separator here is the whitespace parameter. Set your file up in the same format as below.

πŸ’‘ Note: The drawback is if any data in any column contains a space, for example, ‘Grey Life,’ an error will occur.

df = pd.read_table('fiction.txt',  delim_whitespace=True, index_col=0, encoding='utf-8')
print(df)
  • Line [1] reads in the text file, sets the field separator (delimiter) to whitespace, and sets the index to column 0. Setting the encoding parameter catches and prevents any UnicodeEncodeError from occurring.
  • Line [2] outputs the DataFrame to the terminal.

Output

 Title PriceInventory
UPC 
3c456328b04a8ee8      Grey48.4923
bade9943ee01b63f     Paris17.284
9546d537fbf99eb6  Dreaming20.5513
a40723994f715420   Houdini30.257
41fc5dce044f16f5 Girl-Blue46.8334

To save this table as a DataFrame, run the code below.

df.to_csv('fiction.csv', index=True, encoding='utf-8')

Read FWF

FWF stands for Fixed Width Fields. The read_fwf() function reads a table of fixed-width formatted lines into a DataFrame.

For this example, create a new text file authors.txt.

Use the following data for this file. Place this file in the current working directory.

fwidths = [
    9,  # Title
    19, # Author
    6  # Price
    ]

df = pd.read_fwf('authors.txt', widths=fwidths)
print(df)
  • Line [1] sets the width for each column in authors.txt.
  • Line [2] reads in authors.txt and sets the widths of each column to their corresponding item in the widths list.
  • Line [3] outputs the DataFrame to the terminal.

Output

 TitleAuthorPrice
0Grey   Steve Smith 20.88
1Paris Audrey Cohill 23.67
2Dreaming   Alex Balfour 10.99
3Houdini Paula Greaves 25.66

🌍 Related Tutorial: Python Convert Fixed Width File to CSV