Given an HTML table (code) in a file or at a given URL. First, load all HTML tables into the Python script by calling Pandas’
pd.read_html() passing the URL of the HTML document. The result is a list of DataFrames, one per HTML table in the document. Second, convert any specific DataFrame to a CSV by calling the
Here’s the general example, replace your specific URL and output CSV file:
import pandas as pd html = 'https://en.wikipedia.org/wiki/Python_(programming_language)' csv = 'my_file.csv' # 1. Read all HTML tables from a given URL tables = pd.read_html(html) # 2. Write first table, for example, to the CSV file tables.to_csv(csv)
Example – Exporting Python’s Wiki Page Table to CSV
Given the first descriptive table of the Python wiki page:
You convert it to a CSV by using the following approach outlined above:
import pandas as pd # 1. Read all HTML tables from a given URL tables = pd.read_html('https://en.wikipedia.org/wiki/Python_(programming_language)') # 2. Write first table, for example, to the CSV file tables.to_csv('my_file.csv')
So, basically we convert the following input table (HTML):
to the following output:
How to Convert HTML Table in File to CSV File in Python
💬 Challenge: Given a single HTML table stored in a file
'my_file.html'. How to convert that table file to a CSV file in Python?
pandas.read_html() function works if you use file paths or URLs as arguments! To convert an HTML table file
'my_file.html' to a CSV file
'my_file.csv' in Python, use the following three steps:
- Import the pandas library
- Read the HTML table as a DataFrame
- Write the DataFrame to a CSV by calling
df.to_csv('my_file.csv', index=False)if you don’t need an index of row numbers.
Here’s the concrete code that works:
import pandas as pd # Select the only (first) table using indexing  df = pd.read_html('my_file.html') # Write DataFrame to CSV - no index required df.to_csv('my_file.csv', index=False)
This is the original HTML table file
This is the converted CSV file
You can learn more about how to read an HTML table into a Pandas DataFrame in the following article:
🌍 Recommended Resource: How to Read HTML Tables with Pandas
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.