Story: This series of articles assumes you are a contractor hired by the NHL (National Hockey League) to produce a CSV file based on Team Stats from 1990-2011.
The data for this series is located on a live website in HTML table format.
π‘ Note: Before continuing, we recommend you possess, at minimum, a basic knowledge of HTML and CSS.
Part 1 focuses on:
- Describing HTML Tables.
- Reviewing the NHL website.
- Understanding HTTP Status Codes.
- Connecting to the NHL website using the
library.requests
- Viewing the HTML code.
- Closing the Open Connection.
Part 2 focuses on:
- Retrieving Total Number of Pages
- Configuring the Page URL
- Creating a While Loop to Navigate Pages
Part 3 focuses on:
- Looping through the NFL web pages.
- Scraping the data from each page.
- Exporting the data to a CSV file.
Preparation
- The Pandas library enables access to/from a DataFrame.
- The Requests library provides access to HTTP requests in Python.
- The Beautiful Soup
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
π‘ Note: The time
library is built-in and does not require installation.
This library contains time.sleep() used to set a delay between page scrapes. This code is in Part 3.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install requests
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install beautifulsoup4
Hit the <Enter>
key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guides for the required libraries.
- How to install Pandas on PyCharm
- How to install Requests on PyCharm
- How to install BeautifulSoup4 on PyCharm
import pandas as pd import requests from bs4 import BeautifulSoup import time
What are HTML Tables?
HTML tables offer Web Designers/Developers a way to arrange data into rows and columns. HTML tables are similar to Excel spreadsheets.
HTML tables are made up of:
- a table structure (
<table></table>
) - a heading row (
<th></th>
) - unlimited rows (
<tr></tr>
) - unlimited columns (
<td></td>
)
In HTML, tables are set up similar to the code below.
<table> <tr> <th>col 1</h1> <th>col 2</h1> </tr> <tr> <td>data 1</td> <td>data 2</td> </tr> </table>
Below is a partial sample of an HTML table. This table is located on the NFL website we will be scraping.

π‘ Note: For additional information on HTML tables, click here.
Website Review
Let’s navigate to the NHL website and review the format.
At first glance, you will notice:
- the web page displays the NHL stats inside a formatted structure (an HTML table).
- a pagination area at the bottom depicting:
- page hyperlinks from 1- 24.
- a next page hyperlink (
>>
).
- a Per Page (dropdown box) displaying 25 records per page (by default).
π‘ Note: This series of articles uses the Google Chrome browser.
HTTP Response Codes
When you attempt to connect from your Python code to any URL, an HTTP Response Code returns, indicating the connection status.
This code can be any one of the following:
100 β199 | Informational responses |
200 β299 | Successful responses |
300β399 | Redirection messages |
400β499 | Client error responses |
500β599 | Server error responses |
π‘ Note: To view a detailed list of HTTP Status Codes, click here.
Connect to NHL Website
Before any scraping can occur, we need to determine if we can successfully connect to this website. We do this using the requests
library. If successful, an HTTP Status Code
of 200 returns.
Let’s try running this code by performing the following steps:
- Open an IDE terminal.
- Create a new Python file (example:
hockey.py
). - Copy and paste the code below into this file.
- Save and run this file.
web_url = 'https://scrapethissite.com/pages/forms/' res = requests.get(web_url) print(res)
- Line [1] assigns the NHL’s website URL to the
web_url
variable. - Line [2] attempts to connect to the NHL’s website using the
requests.get()
method. An HTTP Status Code returns and saves to theres
variable. - Line [3] outputs the contents of the
res
variable to the terminal.
Output:
<Response [200]>
Great news! The connection to the NHL website works!
π‘ Note: You may want to remove Line [3] before continuing.
HTML Code Overview
The next step is to view the HTML code. This step enables us to locate specific HTML elements/tags we need to scrape the data.
There are two (2) ways to perform this task:
- Run Python code to send the HTML code to the terminal window and locate the required information by scrolling through the HTML code.
- Display the HTML code in the current browser window and use the
Inspect
tool to locate the required information.
View HTML Code in Terminal
To view the HTML code in a terminal window, navigate to an IDE, and run the following code:
π‘ Note: Remember to add in the Required Starter Code.
if res: soup = BeautifulSoup(res.content, 'html.parser') print(soup.prettify()) else: print(f'The following error occured: {res}')
- Line [1] initiates an
if
statement. If the variableres
contains the value 200 (success), the code inside this statement executes.- Line [2] saves the HTML code of the web page URL (
web_url
) created earlier to thesoup
variable. - Line [3] outputs the
prettify
version of the HTML code to the terminal.
- Line [2] saves the HTML code of the web page URL (
- Lines [4-5] execute if the value of the
res
variable contains anything other than 200 (success).
π‘ Note: You may want to remove Line [3] before continuing.
Output:
After running the above code, the visible area of the HTML code in the terminal is the bottom portion denoted by the </html>
tag.

π‘ Note: Scroll up to peruse the entire HTML code
View HTML Code in Browser
To view the HTML code in a browser, perform the following steps:
- Open a browser and navigate to the NHL website.
- In any whitespace, right-mouse click to display a pop-up menu.
- Click to select the
Inspect
menu item.

The HTML code displays on the right-hand side of the browser window.
In this instance, the top part of the HTML code shows as denoted by the <!DOCTYPE HTML>
tag.

Part 2 delves deeper into accessing specific elements/tags now that you are familiar with how to view HTML code.
π‘ Note: If you are familiar with HTML and CSS, option one (1) may best suit your needs.
Close the Connection
In the code above, a connection to the NFL website was established and opened. First, however, this connection needs to be closed.
An additional line of code is added to resolve this issue.
web_url = 'https://scrapethissite.com/pages/forms/' res = requests.get(web_url) if res: soup = BeautifulSoup(res.content, 'html.parser') res.close() else: print(f'The following error occured: {res}')
π‘ Note: If successful, a connection is made from the Python code to the NFL website. Remember to close this connection when not in use.
Summary
In this article, you learned how to:
- Review the NHL website.
- Understand HTTP Status Codes.
- Connect to the NHL website using the
requests
library. - View HTML code in an IDE.
- View HTML code in a Web Browser.
- Close the open connection.
What’s Next
In Part 2 of this series, you will learn to identify elements/tags inside HTML code to create a web scraping app.

At university, I found my love of writing and coding. Both of which I was able to use in my career.
During the past 15 years, I have held a number of positions such as:
In-house Corporate Technical Writer for various software programs such as Navision and Microsoft CRM
Corporate Trainer (staff of 30+)
Programming Instructor
Implementation Specialist for Navision and Microsoft CRM
Senior PHP Coder