How to Convert a Log to a CSV File in Python?

5/5 - (2 votes)

A not-so-fictious problem: Say, you’ve created a web application that runs on a dedicated Linux server in the cloud. Thousands of users visit your web app and suddenly … it crashes. Your users start complaining, and you lose revenue. More importantly, you bleed credibility by the hour. Your server is down, so what do you do? 🀯

First, don’t panic. πŸ›Έ

Let’s analyze your server logs!

This article shows you how to convert your log file to a CSV file in Python, that you can use for further processing (e.g., in Pandas or Excel).

Problem Formulation by Example

Given a file my_file.log like this one I pulled from a real IBM server log example:

03/22   08:51:01   INFO   :.main: *************** RSVP Agent started ***************
03/22   08:51:01   INFO   :...locate_configFile: Specified configuration file: /u/user10/rsvpd1.conf
03/22   08:51:01   INFO   :.main: Using log level 511
03/22   08:51:01   INFO   :..settcpimage: Get TCP images rc - EDC8112I Operation not supported on socket.
03/22   08:51:01   INFO   :..settcpimage: Associate with TCP/IP image name = TCPCS

How to convert this log file to a CSV file of the following standard comma-separated values format:

03/22,08:51:01,INFO,:.main: *************** RSVP Agent started ***************
03/22,08:51:01,INFO,:...locate_configFile: Specified configuration file: /u/user10/rsvpd1.conf
03/22,08:51:01,INFO,:.main: Using log level 511
03/22,08:51:01,INFO,:..settcpimage: Get TCP images rc - EDC8112I Operation not supported on socket.
03/22,08:51:01,INFO,:..settcpimage: Associate with TCP/IP image name = TCPCS

Or, here’s how that would look if you opened it with Excel:

Prettier, isn’t it? Unlike the first representation (log file), this CSV representation is easier to read for (most) human beings. πŸ€–

Convert Server Log to CSV with Pandas

You can convert a .log file to a CSV file in Python in four simple steps: (1) Install the Pandas library, (2) import the Pandas library, (3) read the log file as DataFrame, and (4) write the DataFrame to the CSV file.

  1. (Optional in shell) pip install pandas
  2. import pandas as pd
  3. df = pd.read_csv('my_file.log', sep='\s\s+', engine='python')
  4. df.to_csv('my_file.csv', index=None)

Here’s a minimal example:

import pandas as pd
df = pd.read_csv('my_file.log', sep='\s\s+', engine='python')
df.to_csv('my_file.csv', index=None)

ℹ️ Note: The regular expression sep='\s\s+' specifies more than one single whitespace as a separator between two CSV values. If you have a different separator string, you can define it here.

You specify the engine='python' to tell Pandas that we want the Python regular expression engine to process the separator regular expression.

The result of the code is the following CSV file:

You can use this CSV file as input for, say, an Excel sheet or Google Spreadsheet for further processing and analysis.

This is what your log file looks converted to a CSV and imported to Excel:

And this is how your log file looks as a Pandas DataFrame:

   03/22  ... :.main: *************** RSVP Agent started ***************
0  03/22  ...  :...locate_configFile: Specified configuration...        
1  03/22  ...                        :.main: Using log level 511        
2  03/22  ...  :..settcpimage: Get TCP images rc - EDC8112I O...        
3  03/22  ...  :..settcpimage: Associate with TCP/IP image na...        

[4 rows x 4 columns]

🌍 Related Tutorial: Python Pandas DataFrame to_csv()