Story: This series of articles assumes you are a contractor hired by the NHL (National Hockey League) to produce a CSV file based on Team Stats from 1990-2011.
The data for this series is located on a live website in HTML table format.
β₯οΈ Info: Are you AI curious but you still have to create real impactful projects? Join our official AI builder club on Skool (only $5): SHIP! - One Project Per Month
π‘ Note: Before continuing, we recommend you possess, at best, a minimum basic knowledge of HTML and CSS.
Part 1 focused on:
- Describing HTML Tables.
- Reviewing the NHL website.
- Understanding HTTP Status Codes.
- Connecting to the NHL website using the
library.requests - Viewing the HTML code.
- Closing the Open Connection.
Part 2 focuses on:
- Retrieving Total Number of Pages
- Configuring the Page URL
- Creating a While Loop to Navigate Pages
Part 3 focuses on:
- Looping through the NFL web pages.
- Scraping the data from each page.
- Exporting the data to a CSV file.
Preparation
This article assumes you have installed the following libraries from Part 1:
- The Pandas library.
- The Requests library.
- The Beautiful Soup
library.
import pandas as pd import requests from bs4 import BeautifulSoup import time
Total Pages Overview
There are two (2) ways to retrieve this information:
- Run Python code to send the HTML code to the terminal window and locate the information needed by scrolling through the HTML code.
- Display the HTML code in the current browser window and use the
Inspecttool to locate the required information.
π‘ Note: The remainder of these articles use Google Chrome to find the required information (Option 2).
Retrieve Total Pages
Our goal in this section is to retrieve the total pages to scrape. This value will be saved in our Python code to use later.
As indicated on the pagination bar, this value is 24.

To locate the HTML code related to this value, perform the following steps:
- Navigate to the NHL website.
- Scroll down to the pagination bar.
- With your mouse, hover over hyperlink 24.
- Right-mouse click to display a pop-up menu.
- Click to select
Inspect. This option opens the HTML code window to the right of the browser window.

The HTML code relating to the selected hyperlink now contains a highlight.

Upon reviewing the HTML code, we can see that the highlighted Line is the second (2nd) last <li> element/tag in the HTML code. This is confirmed by the </ul> tag which closes the open <ul> (unordered list) tag.
Good to know! Now let’s reference that in our Python code.
web_url = 'https://scrapethissite.com/pages/forms/'
res = requests.get(web_url)
if res:
soup = BeautifulSoup(res.content, 'html.parser')
total_pgs = int([li.text for li in soup.find_all('li')][-2].strip())
print(total_pgs)
res.close()
else:
print(f'The following error occured: {res}')The highlighted code lines are described below.
- Line [1] does the following:
- Uses List Comprehension to loop through all <li> tags inside res.content. This content contains the HTML code of the NFL’s home page.
- Uses slicing to retrieve the second (2nd) last
<li>element on the web page (24).
- Line [2] outputs the contents of
total_pgsto the terminal. - Line [3] closes the open connection.
π‘ Note: You may want to remove Line [2] before continuing.
Output
24
Configure Page URL
The next step is to determine how to properly navigate from page to page while performing the scrape operation.
When you first navigate to the NHL site, the URL in the address bar is the following:
https://www.scrapethissite.com/pages/forms/
Let’s see what happens when we click hyperlink [1] in the pagination bar.

The page reloads, and the URL in the address bar changes to the following:
https://www.scrapethissite.com/pages/forms/?page_num=1
Notice the page number appends to the original URL (?page_num=1).
π‘ Note: Click other hyperlinks in the pagination bar to confirm this.
We can use this configuration to loop through all pages to scrape!
Creating a While Loop
The code below incorporates a While Loop to navigate through all pages (URLs) of the NHL website.
web_url = 'https://scrapethissite.com/pages/forms/'
res = requests.get(web_url)
cur_page = 1
if res:
soup = BeautifulSoup(res.content, 'html.parser')
total_pgs = int([li.text for li in soup.find_all('li')][-2].strip())
while cur_page <= total_pgs:
pg_url = f'{web_url}?page_num={str(cur_page)}'
print(pg_url)
cur_page += 1
res.close()
else:
print(f'The following error occured: {res}')- Line [1] assigns the NHL’s website URL to the
web_urlvariable. - Line [2] attempts to connect to the NHL’s website using the
requests.get()method. An HTTP Status Code returns and saves to theresvariable. - Line [3] creates a new variable
cur_pageto keep track of the page we are currently on. This variable is initially set to a value of one (1). - Line [4] initiates an
ifstatement. If the variablerescontains the value 200 (success), the code inside this statement executes.- Line [5] retrieves the HTML content of the current web page (home page).
- Line [6] uses List Comprehension and Slicing to retrieve the total pages to scrape. This value saves to
total_pgs.
- Line [7] initiates a While Loop which repeats until
cur_pgequalstotal_pgs.- Line [8] creates a new variable
pg_urlby combining the variableweb_urlwith thecur_pagevariable. - Line [9] outputs the value of the
to the terminal for each loop.pg_url - Line [10] increases the value of
cur_pageby one (1).
- Line [8] creates a new variable
- Line [11] closes the open connection.
- Lines [12-13] execute if the value of
rescontains anything other than 200 (success).
Output (snippet)
https://scrapethissite.com/pages/forms/?page_num=1... |
π‘ Note: You may want to remove Line [9] before continuing.
We’re almost there!
Summary
In this article, you learned how to:
- Use a Web Browser to locate and retrieve Total Pages.
- Configure the URL to loop through all pages of the NHL website.
What’s Next
In Part 3 of this series, you will learn to identify and parse the <table> tags. Finally, we will put this all together to complete our web scraping app.