How to Read HTML Tables with Pandas

The Pandas read_html() function is an easy way to convert an HTML table (e.g., stored at a given URL) to a Pandas DataFrame. You pass a location string or path to it and it returns a list of DataFrames, each representing one table from the location path or URL.

The following code, for example, reads all tables of the Wikipedia Python article into a list of DataFrames (one df per HTML table):

import pandas as pd

tables = pd.read_html('https://en.wikipedia.org/wiki/Python_(programming_language)')
print(f'Number of tables: {len(tables)}')
# Number of tables: 13

The type of the return value is a list of DataFrames:

print(type(tables))
# <class 'list'>

print(type(tables[0]))
# <class 'pandas.core.frame.DataFrame'>

Full Example

This is the HTML table code of the first table in the tables list of HTML tables:

I give the code of the HTML table in the Appendix below.

And this is the resulting DataFrame from the list of DataFrames after calling Pandas’ read_html():

>>> tables[0]
                                                    0                                                  1
0                                                 NaN                                                NaN
1                                                 NaN                                                NaN
2                                            Paradigm  Multi-paradigm: object-oriented,[1] procedural...
3                                         Designed by                                   Guido van Rossum
4                                           Developer                         Python Software Foundation
5                                      First appeared                  20 February 1991; 31 years ago[2]
6                                                 NaN                                                NaN
7                                      Stable release             3.10.6[3] / 2 August 2022; 16 days ago
8                                     Preview release          3.11.0rc1[4] / 8 August 2022; 10 days ago
9                                   Typing discipline  Duck, dynamic, strong typing;[5] gradual (sinc...
10                                                 OS  Windows, macOS, Linux/UNIX, Android[7][8] and ...
11                                            License                 Python Software Foundation License
12                                Filename extensions  .py, .pyi, .pyc, .pyd, .pyw, .pyz (since 3.5),...
13                                            Website                                         python.org
14                              Major implementations                              Major implementations
15  CPython, PyPy, Stackless Python, MicroPython, ...  CPython, PyPy, Stackless Python, MicroPython, ...
16                                           Dialects                                           Dialects
17                      Cython, RPython, Starlark[12]                      Cython, RPython, Starlark[12]
18                                      Influenced by                                      Influenced by
19  ABC,[13] Ada,[14] ALGOL 68,[15] APL,[16] C,[17...  ABC,[13] Ada,[14] ALGOL 68,[15] APL,[16] C,[17...
20                                         Influenced                                         Influenced
21  Apache Groovy, Boo, Cobra, CoffeeScript,[24] D...  Apache Groovy, Boo, Cobra, CoffeeScript,[24] D...
22                    Python Programming at Wikibooks                    Python Programming at Wikibooks

ℹ️ Note: The pandas.read_html() function returns a list of DataFrames, one DataFrame per HTML table. So, tables[0] returns the first table in the HTML document, tables[1] returns the second table in the HTML document, and so on. You can get the number of tables in the document by wrapping the result in the len() function like so: len(pd.read_html(...)).

An interesting application of this approach discussed in this article is to convert a given HTML table to a CSV by using the pandas.read_html() function in combination with the df.to_csv() method.

Related Video

🌍 Learn More: Reading and Writing HTML with Pandas

Appendix

This is the HTML code of the scrapped HTML table (example):

<table class="infobox vevent"><caption class="infobox-title summary">Python</caption><tbody><tr><td colspan="2" class="infobox-image"><a href="/wiki/File:Python-logo-notext.svg" class="image"><img alt="Python-logo-notext.svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/121px-Python-logo-notext.svg.png" decoding="async" width="121" height="121" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/182px-Python-logo-notext.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/242px-Python-logo-notext.svg.png 2x" data-file-width="110" data-file-height="110"></a></td></tr><tr><td colspan="2" class="infobox-full-data"><div style="text-align:center;"></div></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Programming_paradigm" title="Programming paradigm">Paradigm</a></th><td class="infobox-data"><a href="/wiki/Multi-paradigm_programming_language" class="mw-redirect" title="Multi-paradigm programming language">Multi-paradigm</a>: <a href="/wiki/Object-oriented_programming" title="Object-oriented programming">object-oriented</a>,<sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup> <a href="/wiki/Procedural_programming" title="Procedural programming">procedural</a> (<a href="/wiki/Imperative_programming" title="Imperative programming">imperative</a>), <a href="/wiki/Functional_programming" title="Functional programming">functional</a>, <a href="/wiki/Structured_programming" title="Structured programming">structured</a>, <a href="/wiki/Reflective_programming" title="Reflective programming">reflective</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_design" title="Software design">Designed&nbsp;by</a></th><td class="infobox-data"><a href="/wiki/Guido_van_Rossum" title="Guido van Rossum">Guido van Rossum</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_developer" class="mw-redirect" title="Software developer">Developer</a></th><td class="infobox-data organiser"><a href="/wiki/Python_Software_Foundation" title="Python Software Foundation">Python Software Foundation</a></td></tr><tr><th scope="row" class="infobox-label">First&nbsp;appeared</th><td class="infobox-data">20&nbsp;February 1991<span class="noprint">; 31 years ago</span><span style="display:none">&nbsp;(<span class="bday dtstart published updated">1991-02-20</span>)</span><sup id="cite_ref-alt-sources-history_2-0" class="reference"><a href="#cite_note-alt-sources-history-2">[2]</a></sup></td></tr><tr><td colspan="2" class="infobox-full-data"><link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r1066479718"></td></tr><tr><th scope="row" class="infobox-label" style="white-space: nowrap;"><a href="/wiki/Software_release_life_cycle" title="Software release life cycle">Stable release</a></th><td class="infobox-data"><div style="margin:0px;">3.10.6<sup id="cite_ref-wikidata-7f169d99022038b3a6e5d41083301f64776ffb60-v3_3-0" class="reference"><a href="#cite_note-wikidata-7f169d99022038b3a6e5d41083301f64776ffb60-v3-3">[3]</a></sup>&nbsp;<a href="https://www.wikidata.org/wiki/Q28865?uselang=en#P348" title="Edit this on Wikidata"><img alt="Edit this on Wikidata" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" decoding="async" width="10" height="10" style="vertical-align: text-top" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" data-file-width="20" data-file-height="20"></a>
   / 2 August 2022<span class="noprint">; 16 days ago</span><span style="display:none">&nbsp;(<span class="bday dtstart published updated">2 August 2022</span>)</span></div></td></tr><tr><th scope="row" class="infobox-label" style="white-space: nowrap;"><a href="/wiki/Software_release_life_cycle#Beta" title="Software release life cycle">Preview release</a></th><td class="infobox-data"><div style="margin:0px;">3.11.0rc1<sup id="cite_ref-wikidata-ffe8152c4aa79586b276ccdeaca9829daa2bc5af-v3_4-0" class="reference"><a href="#cite_note-wikidata-ffe8152c4aa79586b276ccdeaca9829daa2bc5af-v3-4">[4]</a></sup>&nbsp;<a href="https://www.wikidata.org/wiki/Q28865?uselang=en#P348" title="Edit this on Wikidata"><img alt="Edit this on Wikidata" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" decoding="async" width="10" height="10" style="vertical-align: text-top" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" data-file-width="20" data-file-height="20"></a>
   / 8 August 2022<span class="noprint">; 10 days ago</span><span style="display:none">&nbsp;(<span class="bday dtstart published updated">8 August 2022</span>)</span></div></td></tr><tr style="display:none"><td colspan="2">
</td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Type_system" title="Type system">Typing discipline</a></th><td class="infobox-data"><a href="/wiki/Duck_typing" title="Duck typing">Duck</a>, <a href="/wiki/Dynamic_typing" class="mw-redirect" title="Dynamic typing">dynamic</a>, <a href="/wiki/Strong_and_weak_typing" title="Strong and weak typing">strong typing</a>;<sup id="cite_ref-5" class="reference"><a href="#cite_note-5">[5]</a></sup> <a href="/wiki/Gradual_typing" title="Gradual typing">gradual</a> (since 3.5, but ignored in <a href="/wiki/CPython" title="CPython">CPython</a>)<sup id="cite_ref-6" class="reference"><a href="#cite_note-6">[6]</a></sup></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Operating_system" title="Operating system">OS</a></th><td class="infobox-data"><a href="/wiki/Windows" class="mw-redirect" title="Windows">Windows</a>, <a href="/wiki/MacOS" title="MacOS">macOS</a>, <a href="/wiki/Linux" title="Linux">Linux/UNIX</a>, <a href="/wiki/Android_(operating_system)" title="Android (operating system)">Android</a><sup id="cite_ref-7" class="reference"><a href="#cite_note-7">[7]</a></sup><sup id="cite_ref-8" class="reference"><a href="#cite_note-8">[8]</a></sup> and more<sup id="cite_ref-9" class="reference"><a href="#cite_note-9">[9]</a></sup></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_license" title="Software license">License</a></th><td class="infobox-data"><a href="/wiki/Python_Software_Foundation_License" title="Python Software Foundation License">Python Software Foundation License</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Filename_extension" title="Filename extension">Filename extensions</a></th><td class="infobox-data">.py, .pyi, .pyc, .pyd, .pyw, .pyz (since 3.5),<sup id="cite_ref-10" class="reference"><a href="#cite_note-10">[10]</a></sup> .pyo (prior to 3.5)<sup id="cite_ref-11" class="reference"><a href="#cite_note-11">[11]</a></sup></td></tr><tr><th scope="row" class="infobox-label">Website</th><td class="infobox-data"><span class="url"><a rel="nofollow" class="external text" href="https://www.python.org/">python.org</a></span></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Major <a href="/wiki/Programming_language_implementation" title="Programming language implementation">implementations</a></th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/CPython" title="CPython">CPython</a>, <a href="/wiki/PyPy" title="PyPy">PyPy</a>, <a href="/wiki/Stackless_Python" title="Stackless Python">Stackless Python</a>, <a href="/wiki/MicroPython" title="MicroPython">MicroPython</a>, <a href="/wiki/CircuitPython" title="CircuitPython">CircuitPython</a>, <a href="/wiki/IronPython" title="IronPython">IronPython</a>, <a href="/wiki/Jython" title="Jython">Jython</a></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;"><a href="/wiki/Programming_language#Dialects,_flavors_and_implementations" title="Programming language">Dialects</a></th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/Cython" title="Cython">Cython</a>, <a href="/wiki/PyPy#RPython" title="PyPy">RPython</a>, <a href="/wiki/Bazel_(software)" title="Bazel (software)">Starlark</a><sup id="cite_ref-12" class="reference"><a href="#cite_note-12">[12]</a></sup></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Influenced by</th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/ABC_(programming_language)" title="ABC (programming language)">ABC</a>,<sup id="cite_ref-faq-created_13-0" class="reference"><a href="#cite_note-faq-created-13">[13]</a></sup> <a href="/wiki/Ada_(programming_language)" title="Ada (programming language)">Ada</a>,<sup id="cite_ref-14" class="reference"><a href="#cite_note-14">[14]</a></sup> <a href="/wiki/ALGOL_68" title="ALGOL 68">ALGOL 68</a>,<sup id="cite_ref-98-interview_15-0" class="reference"><a href="#cite_note-98-interview-15">[15]</a></sup> <a href="/wiki/APL_(programming_language)" title="APL (programming language)">APL</a>,<sup id="cite_ref-python.org_16-0" class="reference"><a href="#cite_note-python.org-16">[16]</a></sup> <a href="/wiki/C_(programming_language)" title="C (programming language)">C</a>,<sup id="cite_ref-AutoNT-1_17-0" class="reference"><a href="#cite_note-AutoNT-1-17">[17]</a></sup> <a href="/wiki/C%2B%2B" title="C++">C++</a>,<sup id="cite_ref-classmix_18-0" class="reference"><a href="#cite_note-classmix-18">[18]</a></sup> <a href="/wiki/CLU_(programming_language)" title="CLU (programming language)">CLU</a>,<sup id="cite_ref-effbot-call-by-object_19-0" class="reference"><a href="#cite_note-effbot-call-by-object-19">[19]</a></sup> <a href="/wiki/Dylan_(programming_language)" title="Dylan (programming language)">Dylan</a>,<sup id="cite_ref-AutoNT-2_20-0" class="reference"><a href="#cite_note-AutoNT-2-20">[20]</a></sup> <a href="/wiki/Haskell_(programming_language)" class="mw-redirect" title="Haskell (programming language)">Haskell</a>,<sup id="cite_ref-AutoNT-3_21-0" class="reference"><a href="#cite_note-AutoNT-3-21">[21]</a></sup> <a href="/wiki/Icon_(programming_language)" title="Icon (programming language)">Icon</a>,<sup id="cite_ref-AutoNT-4_22-0" class="reference"><a href="#cite_note-AutoNT-4-22">[22]</a></sup> <a href="/wiki/Lisp_(programming_language)" title="Lisp (programming language)">Lisp</a>,<sup id="cite_ref-AutoNT-6_23-0" class="reference"><a href="#cite_note-AutoNT-6-23">[23]</a></sup> <span class="nowrap"><a href="/wiki/Modula-3" title="Modula-3">Modula-3</a></span>,<sup id="cite_ref-classmix_18-1" class="reference"><a href="#cite_note-classmix-18">[18]</a></sup> <a href="/wiki/Perl" title="Perl">Perl</a>, <a href="/wiki/Standard_ML" title="Standard ML">Standard ML</a>, <a href="/wiki/Visual_Basic" title="Visual Basic">VB</a><sup id="cite_ref-python.org_16-1" class="reference"><a href="#cite_note-python.org-16">[16]</a></sup></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Influenced</th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/Apache_Groovy" title="Apache Groovy">Apache Groovy</a>, <a href="/wiki/Boo_(programming_language)" title="Boo (programming language)">Boo</a>, <a href="/wiki/Cobra_(programming_language)" title="Cobra (programming language)">Cobra</a>, <a href="/wiki/CoffeeScript" title="CoffeeScript">CoffeeScript</a>,<sup id="cite_ref-24" class="reference"><a href="#cite_note-24">[24]</a></sup> <a href="/wiki/D_(programming_language)" title="D (programming language)">D</a>, <a href="/wiki/F_Sharp_(programming_language)" title="F Sharp (programming language)">F#</a>, <a href="/wiki/Genie_(programming_language)" title="Genie (programming language)">Genie</a>,<sup id="cite_ref-25" class="reference"><a href="#cite_note-25">[25]</a></sup> <a href="/wiki/Go_(programming_language)" title="Go (programming language)">Go</a>, <a href="/wiki/JavaScript" title="JavaScript">JavaScript</a>,<sup id="cite_ref-26" class="reference"><a href="#cite_note-26">[26]</a></sup><sup id="cite_ref-27" class="reference"><a href="#cite_note-27">[27]</a></sup> <a href="/wiki/Julia_(programming_language)" title="Julia (programming language)">Julia</a>,<sup id="cite_ref-Julia_28-0" class="reference"><a href="#cite_note-Julia-28">[28]</a></sup> <a href="/wiki/Nim_(programming_language)" title="Nim (programming language)">Nim</a>, <a href="/wiki/Ring_(programming_language)" title="Ring (programming language)">Ring</a>,<sup id="cite_ref-The_Ring_programming_language_and_other_languages_29-0" class="reference"><a href="#cite_note-The_Ring_programming_language_and_other_languages-29">[29]</a></sup> <a href="/wiki/Ruby_(programming_language)" title="Ruby (programming language)">Ruby</a>,<sup id="cite_ref-bini_30-0" class="reference"><a href="#cite_note-bini-30">[30]</a></sup> <a href="/wiki/Swift_(programming_language)" title="Swift (programming language)">Swift</a><sup id="cite_ref-lattner2014_31-0" class="reference"><a href="#cite_note-lattner2014-31">[31]</a></sup></td></tr><tr><td colspan="2" class="infobox-below hlist" style="border-top: 1px solid #aaa; padding-top: 3px;">
<ul><li><a href="/wiki/File:Wikibooks-logo-en-noslogan.svg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/16px-Wikibooks-logo-en-noslogan.svg.png" decoding="async" width="16" height="16" class="noviewer" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/24px-Wikibooks-logo-en-noslogan.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/32px-Wikibooks-logo-en-noslogan.svg.png 2x" data-file-width="400" data-file-height="400"></a> <a href="https://en.wikibooks.org/wiki/Python_Programming" class="extiw" title="wikibooks:Python Programming">Python Programming</a> at Wikibooks</li></ul>
</td></tr></tbody></table>