Convert HTML Table to CSV in Python

4.5/5 - (4 votes)

Given an HTML table (code) in a file or at a given URL. First, load all HTML tables into the Python script by calling Pandas’ pd.read_html() passing the URL of the HTML document. The result is a list of DataFrames, one per HTML table in the document. Second, convert any specific DataFrame to a CSV by calling the df.to_csv() function.

Here’s the general example, replace your specific URL and output CSV file:

import pandas as pd

html = 'https://en.wikipedia.org/wiki/Python_(programming_language)'
csv = 'my_file.csv'

# 1. Read all HTML tables from a given URL
tables = pd.read_html(html)

# 2. Write first table, for example, to the CSV file
tables[0].to_csv(csv)

Example – Exporting Python’s Wiki Page Table to CSV

Given the first descriptive table of the Python wiki page:

You convert it to a CSV by using the following approach outlined above:

import pandas as pd


# 1. Read all HTML tables from a given URL
tables = pd.read_html('https://en.wikipedia.org/wiki/Python_(programming_language)')

# 2. Write first table, for example, to the CSV file
tables[0].to_csv('my_file.csv')

So, basically we convert the following input table (HTML):

to the following output:

How to Convert HTML Table in File to CSV File in Python

πŸ’¬ Challenge: Given a single HTML table stored in a file 'my_file.html'. How to convert that table file to a CSV file in Python?

The pandas.read_html() function works if you use file paths or URLs as arguments! To convert an HTML table file 'my_file.html' to a CSV file 'my_file.csv' in Python, use the following three steps:

  1. Import the pandas library
  2. Read the HTML table as a DataFrame df by calling pd.read_html('my_file.html')
  3. Write the DataFrame to a CSV by calling df.to_csv('my_file.csv', index=False) if you don’t need an index of row numbers.

Here’s the concrete code that works:

import pandas as pd


 # Select the only (first) table using indexing [0]
df = pd.read_html('my_file.html')[0]

# Write DataFrame to CSV - no index required
df.to_csv('my_file.csv', index=False)

This is the original HTML table file 'my_file.html':

This is the converted CSV file 'my_file.csv':

Further Reading

You can learn more about how to read an HTML table into a Pandas DataFrame in the following article:

🌍 Recommended Resource: How to Read HTML Tables with Pandas