The Pandas read_html()
function is an easy way to convert an HTML table (e.g., stored at a given URL) to a Pandas DataFrame. You pass a location string or path to it and it returns a list of DataFrames, each representing one table from the location path or URL.
The following code, for example, reads all tables of the Wikipedia Python article into a list of DataFrames (one df
per HTML table):
import pandas as pd tables = pd.read_html('https://en.wikipedia.org/wiki/Python_(programming_language)') print(f'Number of tables: {len(tables)}') # Number of tables: 13
The type of the return value is a list of DataFrames:
print(type(tables)) # <class 'list'> print(type(tables[0])) # <class 'pandas.core.frame.DataFrame'>
Full Example
This is the HTML table code of the first table in the tables
list of HTML tables:
I give the code of the HTML table in the Appendix below.
And this is the resulting DataFrame from the list of DataFrames after calling Pandas’ read_html()
:
>>> tables[0] 0 1 0 NaN NaN 1 NaN NaN 2 Paradigm Multi-paradigm: object-oriented,[1] procedural... 3 Designed by Guido van Rossum 4 Developer Python Software Foundation 5 First appeared 20 February 1991; 31 years ago[2] 6 NaN NaN 7 Stable release 3.10.6[3] / 2 August 2022; 16 days ago 8 Preview release 3.11.0rc1[4] / 8 August 2022; 10 days ago 9 Typing discipline Duck, dynamic, strong typing;[5] gradual (sinc... 10 OS Windows, macOS, Linux/UNIX, Android[7][8] and ... 11 License Python Software Foundation License 12 Filename extensions .py, .pyi, .pyc, .pyd, .pyw, .pyz (since 3.5),... 13 Website python.org 14 Major implementations Major implementations 15 CPython, PyPy, Stackless Python, MicroPython, ... CPython, PyPy, Stackless Python, MicroPython, ... 16 Dialects Dialects 17 Cython, RPython, Starlark[12] Cython, RPython, Starlark[12] 18 Influenced by Influenced by 19 ABC,[13] Ada,[14] ALGOL 68,[15] APL,[16] C,[17... ABC,[13] Ada,[14] ALGOL 68,[15] APL,[16] C,[17... 20 Influenced Influenced 21 Apache Groovy, Boo, Cobra, CoffeeScript,[24] D... Apache Groovy, Boo, Cobra, CoffeeScript,[24] D... 22 Python Programming at Wikibooks Python Programming at Wikibooks
ℹ️ Note: The pandas.read_html()
function returns a list of DataFrames, one DataFrame per HTML table. So, tables[0]
returns the first table in the HTML document, tables[1]
returns the second table in the HTML document, and so on. You can get the number of tables in the document by wrapping the result in the len()
function like so: len(pd.read_html(...))
.
An interesting application of this approach discussed in this article is to convert a given HTML table to a CSV by using the pandas.read_html()
function in combination with the df.to_csv()
method.
Related Video
🌍 Learn More: Reading and Writing HTML with Pandas
Appendix
This is the HTML code of the scrapped HTML table (example):
<table class="infobox vevent"><caption class="infobox-title summary">Python</caption><tbody><tr><td colspan="2" class="infobox-image"><a href="/wiki/File:Python-logo-notext.svg" class="image"><img alt="Python-logo-notext.svg" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/121px-Python-logo-notext.svg.png" decoding="async" width="121" height="121" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/182px-Python-logo-notext.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Python-logo-notext.svg/242px-Python-logo-notext.svg.png 2x" data-file-width="110" data-file-height="110"></a></td></tr><tr><td colspan="2" class="infobox-full-data"><div style="text-align:center;"></div></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Programming_paradigm" title="Programming paradigm">Paradigm</a></th><td class="infobox-data"><a href="/wiki/Multi-paradigm_programming_language" class="mw-redirect" title="Multi-paradigm programming language">Multi-paradigm</a>: <a href="/wiki/Object-oriented_programming" title="Object-oriented programming">object-oriented</a>,<sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup> <a href="/wiki/Procedural_programming" title="Procedural programming">procedural</a> (<a href="/wiki/Imperative_programming" title="Imperative programming">imperative</a>), <a href="/wiki/Functional_programming" title="Functional programming">functional</a>, <a href="/wiki/Structured_programming" title="Structured programming">structured</a>, <a href="/wiki/Reflective_programming" title="Reflective programming">reflective</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_design" title="Software design">Designed by</a></th><td class="infobox-data"><a href="/wiki/Guido_van_Rossum" title="Guido van Rossum">Guido van Rossum</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_developer" class="mw-redirect" title="Software developer">Developer</a></th><td class="infobox-data organiser"><a href="/wiki/Python_Software_Foundation" title="Python Software Foundation">Python Software Foundation</a></td></tr><tr><th scope="row" class="infobox-label">First appeared</th><td class="infobox-data">20 February 1991<span class="noprint">; 31 years ago</span><span style="display:none"> (<span class="bday dtstart published updated">1991-02-20</span>)</span><sup id="cite_ref-alt-sources-history_2-0" class="reference"><a href="#cite_note-alt-sources-history-2">[2]</a></sup></td></tr><tr><td colspan="2" class="infobox-full-data"><link rel="mw-deduplicated-inline-style" href="mw-data:TemplateStyles:r1066479718"></td></tr><tr><th scope="row" class="infobox-label" style="white-space: nowrap;"><a href="/wiki/Software_release_life_cycle" title="Software release life cycle">Stable release</a></th><td class="infobox-data"><div style="margin:0px;">3.10.6<sup id="cite_ref-wikidata-7f169d99022038b3a6e5d41083301f64776ffb60-v3_3-0" class="reference"><a href="#cite_note-wikidata-7f169d99022038b3a6e5d41083301f64776ffb60-v3-3">[3]</a></sup> <a href="https://www.wikidata.org/wiki/Q28865?uselang=en#P348" title="Edit this on Wikidata"><img alt="Edit this on Wikidata" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" decoding="async" width="10" height="10" style="vertical-align: text-top" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" data-file-width="20" data-file-height="20"></a> / 2 August 2022<span class="noprint">; 16 days ago</span><span style="display:none"> (<span class="bday dtstart published updated">2 August 2022</span>)</span></div></td></tr><tr><th scope="row" class="infobox-label" style="white-space: nowrap;"><a href="/wiki/Software_release_life_cycle#Beta" title="Software release life cycle">Preview release</a></th><td class="infobox-data"><div style="margin:0px;">3.11.0rc1<sup id="cite_ref-wikidata-ffe8152c4aa79586b276ccdeaca9829daa2bc5af-v3_4-0" class="reference"><a href="#cite_note-wikidata-ffe8152c4aa79586b276ccdeaca9829daa2bc5af-v3-4">[4]</a></sup> <a href="https://www.wikidata.org/wiki/Q28865?uselang=en#P348" title="Edit this on Wikidata"><img alt="Edit this on Wikidata" src="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/10px-OOjs_UI_icon_edit-ltr-progressive.svg.png" decoding="async" width="10" height="10" style="vertical-align: text-top" srcset="//upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/15px-OOjs_UI_icon_edit-ltr-progressive.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg/20px-OOjs_UI_icon_edit-ltr-progressive.svg.png 2x" data-file-width="20" data-file-height="20"></a> / 8 August 2022<span class="noprint">; 10 days ago</span><span style="display:none"> (<span class="bday dtstart published updated">8 August 2022</span>)</span></div></td></tr><tr style="display:none"><td colspan="2"> </td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Type_system" title="Type system">Typing discipline</a></th><td class="infobox-data"><a href="/wiki/Duck_typing" title="Duck typing">Duck</a>, <a href="/wiki/Dynamic_typing" class="mw-redirect" title="Dynamic typing">dynamic</a>, <a href="/wiki/Strong_and_weak_typing" title="Strong and weak typing">strong typing</a>;<sup id="cite_ref-5" class="reference"><a href="#cite_note-5">[5]</a></sup> <a href="/wiki/Gradual_typing" title="Gradual typing">gradual</a> (since 3.5, but ignored in <a href="/wiki/CPython" title="CPython">CPython</a>)<sup id="cite_ref-6" class="reference"><a href="#cite_note-6">[6]</a></sup></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Operating_system" title="Operating system">OS</a></th><td class="infobox-data"><a href="/wiki/Windows" class="mw-redirect" title="Windows">Windows</a>, <a href="/wiki/MacOS" title="MacOS">macOS</a>, <a href="/wiki/Linux" title="Linux">Linux/UNIX</a>, <a href="/wiki/Android_(operating_system)" title="Android (operating system)">Android</a><sup id="cite_ref-7" class="reference"><a href="#cite_note-7">[7]</a></sup><sup id="cite_ref-8" class="reference"><a href="#cite_note-8">[8]</a></sup> and more<sup id="cite_ref-9" class="reference"><a href="#cite_note-9">[9]</a></sup></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Software_license" title="Software license">License</a></th><td class="infobox-data"><a href="/wiki/Python_Software_Foundation_License" title="Python Software Foundation License">Python Software Foundation License</a></td></tr><tr><th scope="row" class="infobox-label"><a href="/wiki/Filename_extension" title="Filename extension">Filename extensions</a></th><td class="infobox-data">.py, .pyi, .pyc, .pyd, .pyw, .pyz (since 3.5),<sup id="cite_ref-10" class="reference"><a href="#cite_note-10">[10]</a></sup> .pyo (prior to 3.5)<sup id="cite_ref-11" class="reference"><a href="#cite_note-11">[11]</a></sup></td></tr><tr><th scope="row" class="infobox-label">Website</th><td class="infobox-data"><span class="url"><a rel="nofollow" class="external text" href="https://www.python.org/">python.org</a></span></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Major <a href="/wiki/Programming_language_implementation" title="Programming language implementation">implementations</a></th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/CPython" title="CPython">CPython</a>, <a href="/wiki/PyPy" title="PyPy">PyPy</a>, <a href="/wiki/Stackless_Python" title="Stackless Python">Stackless Python</a>, <a href="/wiki/MicroPython" title="MicroPython">MicroPython</a>, <a href="/wiki/CircuitPython" title="CircuitPython">CircuitPython</a>, <a href="/wiki/IronPython" title="IronPython">IronPython</a>, <a href="/wiki/Jython" title="Jython">Jython</a></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;"><a href="/wiki/Programming_language#Dialects,_flavors_and_implementations" title="Programming language">Dialects</a></th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/Cython" title="Cython">Cython</a>, <a href="/wiki/PyPy#RPython" title="PyPy">RPython</a>, <a href="/wiki/Bazel_(software)" title="Bazel (software)">Starlark</a><sup id="cite_ref-12" class="reference"><a href="#cite_note-12">[12]</a></sup></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Influenced by</th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/ABC_(programming_language)" title="ABC (programming language)">ABC</a>,<sup id="cite_ref-faq-created_13-0" class="reference"><a href="#cite_note-faq-created-13">[13]</a></sup> <a href="/wiki/Ada_(programming_language)" title="Ada (programming language)">Ada</a>,<sup id="cite_ref-14" class="reference"><a href="#cite_note-14">[14]</a></sup> <a href="/wiki/ALGOL_68" title="ALGOL 68">ALGOL 68</a>,<sup id="cite_ref-98-interview_15-0" class="reference"><a href="#cite_note-98-interview-15">[15]</a></sup> <a href="/wiki/APL_(programming_language)" title="APL (programming language)">APL</a>,<sup id="cite_ref-python.org_16-0" class="reference"><a href="#cite_note-python.org-16">[16]</a></sup> <a href="/wiki/C_(programming_language)" title="C (programming language)">C</a>,<sup id="cite_ref-AutoNT-1_17-0" class="reference"><a href="#cite_note-AutoNT-1-17">[17]</a></sup> <a href="/wiki/C%2B%2B" title="C++">C++</a>,<sup id="cite_ref-classmix_18-0" class="reference"><a href="#cite_note-classmix-18">[18]</a></sup> <a href="/wiki/CLU_(programming_language)" title="CLU (programming language)">CLU</a>,<sup id="cite_ref-effbot-call-by-object_19-0" class="reference"><a href="#cite_note-effbot-call-by-object-19">[19]</a></sup> <a href="/wiki/Dylan_(programming_language)" title="Dylan (programming language)">Dylan</a>,<sup id="cite_ref-AutoNT-2_20-0" class="reference"><a href="#cite_note-AutoNT-2-20">[20]</a></sup> <a href="/wiki/Haskell_(programming_language)" class="mw-redirect" title="Haskell (programming language)">Haskell</a>,<sup id="cite_ref-AutoNT-3_21-0" class="reference"><a href="#cite_note-AutoNT-3-21">[21]</a></sup> <a href="/wiki/Icon_(programming_language)" title="Icon (programming language)">Icon</a>,<sup id="cite_ref-AutoNT-4_22-0" class="reference"><a href="#cite_note-AutoNT-4-22">[22]</a></sup> <a href="/wiki/Lisp_(programming_language)" title="Lisp (programming language)">Lisp</a>,<sup id="cite_ref-AutoNT-6_23-0" class="reference"><a href="#cite_note-AutoNT-6-23">[23]</a></sup> <span class="nowrap"><a href="/wiki/Modula-3" title="Modula-3">Modula-3</a></span>,<sup id="cite_ref-classmix_18-1" class="reference"><a href="#cite_note-classmix-18">[18]</a></sup> <a href="/wiki/Perl" title="Perl">Perl</a>, <a href="/wiki/Standard_ML" title="Standard ML">Standard ML</a>, <a href="/wiki/Visual_Basic" title="Visual Basic">VB</a><sup id="cite_ref-python.org_16-1" class="reference"><a href="#cite_note-python.org-16">[16]</a></sup></td></tr><tr><th colspan="2" class="infobox-header" style="background-color: #eee;">Influenced</th></tr><tr><td colspan="2" class="infobox-full-data"><a href="/wiki/Apache_Groovy" title="Apache Groovy">Apache Groovy</a>, <a href="/wiki/Boo_(programming_language)" title="Boo (programming language)">Boo</a>, <a href="/wiki/Cobra_(programming_language)" title="Cobra (programming language)">Cobra</a>, <a href="/wiki/CoffeeScript" title="CoffeeScript">CoffeeScript</a>,<sup id="cite_ref-24" class="reference"><a href="#cite_note-24">[24]</a></sup> <a href="/wiki/D_(programming_language)" title="D (programming language)">D</a>, <a href="/wiki/F_Sharp_(programming_language)" title="F Sharp (programming language)">F#</a>, <a href="/wiki/Genie_(programming_language)" title="Genie (programming language)">Genie</a>,<sup id="cite_ref-25" class="reference"><a href="#cite_note-25">[25]</a></sup> <a href="/wiki/Go_(programming_language)" title="Go (programming language)">Go</a>, <a href="/wiki/JavaScript" title="JavaScript">JavaScript</a>,<sup id="cite_ref-26" class="reference"><a href="#cite_note-26">[26]</a></sup><sup id="cite_ref-27" class="reference"><a href="#cite_note-27">[27]</a></sup> <a href="/wiki/Julia_(programming_language)" title="Julia (programming language)">Julia</a>,<sup id="cite_ref-Julia_28-0" class="reference"><a href="#cite_note-Julia-28">[28]</a></sup> <a href="/wiki/Nim_(programming_language)" title="Nim (programming language)">Nim</a>, <a href="/wiki/Ring_(programming_language)" title="Ring (programming language)">Ring</a>,<sup id="cite_ref-The_Ring_programming_language_and_other_languages_29-0" class="reference"><a href="#cite_note-The_Ring_programming_language_and_other_languages-29">[29]</a></sup> <a href="/wiki/Ruby_(programming_language)" title="Ruby (programming language)">Ruby</a>,<sup id="cite_ref-bini_30-0" class="reference"><a href="#cite_note-bini-30">[30]</a></sup> <a href="/wiki/Swift_(programming_language)" title="Swift (programming language)">Swift</a><sup id="cite_ref-lattner2014_31-0" class="reference"><a href="#cite_note-lattner2014-31">[31]</a></sup></td></tr><tr><td colspan="2" class="infobox-below hlist" style="border-top: 1px solid #aaa; padding-top: 3px;"> <ul><li><a href="/wiki/File:Wikibooks-logo-en-noslogan.svg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/16px-Wikibooks-logo-en-noslogan.svg.png" decoding="async" width="16" height="16" class="noviewer" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/24px-Wikibooks-logo-en-noslogan.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/d/df/Wikibooks-logo-en-noslogan.svg/32px-Wikibooks-logo-en-noslogan.svg.png 2x" data-file-width="400" data-file-height="400"></a> <a href="https://en.wikibooks.org/wiki/Python_Programming" class="extiw" title="wikibooks:Python Programming">Python Programming</a> at Wikibooks</li></ul> </td></tr></tbody></table>