A not-so-fictious problem: Say, you’ve created a web application that runs on a dedicated Linux server in the cloud. Thousands of users visit your web app and suddenly … it crashes. Your users start complaining, and you lose revenue. More importantly, you bleed credibility by the hour. Your server is down, so what do you do? π€―
First, don’t panic. πΈ
Let’s analyze your server logs!
This article shows you how to convert your log file to a CSV file in Python, that you can use for further processing (e.g., in Pandas or Excel).
Problem Formulation by Example
Given a file my_file.log
like this one I pulled from a real IBM server log example:
03/22 08:51:01 INFO :.main: *************** RSVP Agent started *************** 03/22 08:51:01 INFO :...locate_configFile: Specified configuration file: /u/user10/rsvpd1.conf 03/22 08:51:01 INFO :.main: Using log level 511 03/22 08:51:01 INFO :..settcpimage: Get TCP images rc - EDC8112I Operation not supported on socket. 03/22 08:51:01 INFO :..settcpimage: Associate with TCP/IP image name = TCPCS
How to convert this log file to a CSV file of the following standard comma-separated values format:
03/22,08:51:01,INFO,:.main: *************** RSVP Agent started *************** 03/22,08:51:01,INFO,:...locate_configFile: Specified configuration file: /u/user10/rsvpd1.conf 03/22,08:51:01,INFO,:.main: Using log level 511 03/22,08:51:01,INFO,:..settcpimage: Get TCP images rc - EDC8112I Operation not supported on socket. 03/22,08:51:01,INFO,:..settcpimage: Associate with TCP/IP image name = TCPCS
Or, here’s how that would look if you opened it with Excel:
Prettier, isn’t it? Unlike the first representation (log file), this CSV representation is easier to read for (most) human beings. π€
Convert Server Log to CSV with Pandas
You can convert a .log
file to a CSV file in Python in four simple steps: (1) Install the Pandas library, (2) import the Pandas library, (3) read the log file as DataFrame, and (4) write the DataFrame to the CSV file.
- (Optional in shell)
pip install pandas
import pandas as pd
df = pd.read_csv('my_file.log', sep='\s\s+', engine='python')
df.to_csv('my_file.csv', index=None)
Here’s a minimal example:
import pandas as pd df = pd.read_csv('my_file.log', sep='\s\s+', engine='python') df.to_csv('my_file.csv', index=None)
βΉοΈ Note: The regular expression sep='\s\s+'
specifies more than one single whitespace as a separator between two CSV values. If you have a different separator string, you can define it here.
You specify the engine='python'
to tell Pandas that we want the Python regular expression engine to process the separator regular expression.
The result of the code is the following CSV file:
You can use this CSV file as input for, say, an Excel sheet or Google Spreadsheet for further processing and analysis.
This is what your log file looks converted to a CSV and imported to Excel:
And this is how your log file looks as a Pandas DataFrame:
03/22 ... :.main: *************** RSVP Agent started *************** 0 03/22 ... :...locate_configFile: Specified configuration... 1 03/22 ... :.main: Using log level 511 2 03/22 ... :..settcpimage: Get TCP images rc - EDC8112I O... 3 03/22 ... :..settcpimage: Associate with TCP/IP image na... [4 rows x 4 columns]
π Related Tutorial: Python Pandas DataFrame to_csv()