This is the first part of a 3-part series on the Python request
library:
- Python Requests Library – Your First HTTP Request in Python
- Python Requests Library – Understanding
requests.get()
Parameters - Python Requests Library – Exception Handling & Advanced request.get() Parameters
Syntax
requests.nameofmethod(parameters)
Background
There are many libraries around that make HTTP requests. However, the requests library seems to be the most popular.
When the requests library sends a URL, the following occurs:
- A DNS lookup converts the URL to an IP address (example:
312.245.123.21
), - The
request
library sends a request to this IP address, - The server attempts to validate this request,
- The server returns a status code as shown below.
π‘Note: The URL https://books.toscrape.com used for some examples in this article welcomes coders and encourages scraping.
Preparation
Before any requests can occur, one (1) new library will require installation.
- The Requests library allows access to its many methods and makes data manipulation a breeze!
To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.
$ pip install requests
Hit the <Enter> key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import requests
Status Codes
Direct quote from Wikipedia:
HTTP response status codes separate into five classes or categories. The first digit of the status code defines the class of response. The last two digits do not have any classifying or categorization role. These five classes are:
1XX | Informational Response | The request was received, continuing process. |
2XX | Success | The request was successfully received, understood & accepted. |
3XX | Redirection | Further action is needed to complete the request. |
4XX | Client Error | The requests contain invalid syntax or incomplete data. |
5XX | Server Error | The server failed to fulfill a valid request. |
The “get” Request: Making a Request
This method uses the GET Request to connect to a website. This function takes a URL as an argument. In this example, a status code returns and displays the status of the connection (success/failure). If invalid, the script abruptly ends.
Run this script. If successful, a status code starting with 2XX outputs to the terminal.
response = requests.get('https://books.toscrape.com') print(response.status_code) response.close()
- Line [1] attempts to connect to the URL.
- Line [2] outputs the status code. Both lines do the same thing.
- Line [3] closes the open connection.
OR
response = requests.get('https://books.toscrape.com') print(requests.codes.ok) response.close()
Output
200 200
As mentioned above, if your status code is other than 200, there is a good chance the script will fail. To prevent this, wrap the code in a try/except
statement.
try: response = requests.get('https://books.toscrape.com') print('OK') response.close() except: print('Error')
- Line [1] initializes the
try
statement. The code inside here will run first.- Line [2] performs a GET request to connect to the URL.
- Line [3] if successful, OK is output to the terminal.
- Line [4] closes the open connection.
- Line [5] is the
except
statement. If the try statement fails, the code falls to here.- Line [6] outputs the message Error to the terminal. The script terminates.
The “get” Request: Response Content
When the code shown below runs, the HTML code on the requested web page is output to the terminal.
try: response = requests.get('https://books.toscrape.com') print(response.text) response.close() except: print('Error')
- Line [1] initializes the try statement. The code inside here will run first.
- Line [2] performs a GET request to connect to the URL.
- Line [3] if successful, OK is output to the terminal.
- Line [4] closes the open connection.
- Line [5] is the except statement. If the try statement fails, the code falls to here.
- Line [6] outputs Error to the terminal. The script terminates.
Output
A small portion of the HTML code displays below.
<article class="product_pod"> <div class="image_container"> <a href="catalogue/the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html"><img src="media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg" alt="The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics" class="thumbnail"></a> </div> ...
Using “timeout”
This method allows the coder to set how long the code will wait before timing out for:
- a connection
- a response
In the example below, the connection time equals 2 seconds. The response time equals 4 seconds.
The best practice is to add the timeout parameter to every request made.
π‘Note: If not entered, the code can hang up to two minutes before crashing. Browser-dependent.
try: response = requests.get('https://books.toscrape.com', timeout=(2, 4)) print(response.text) response.close() except: print('Error')
- Line [1] initializes the try statement. The code inside here will run first.
- Line [2] performs a GET request to connect to the URL and sets a timeout.
- Line [3] if the response is successful, the HTML code from the URL outputs to the terminal.
- Line [4] closes the open connection.
- Line [5] is the except statement. If the try statement fails, the code falls to here.
- Line [6] outputs Error to the terminal. The script automatically terminates.
Output
See above.
Summary
In this article, we learned how to:
- Connect to a URL
- Retrieve and display status codes
- Output the HTML code to the terminal
- Use the try/except statement to catch errors
- Set a timeout
- Close any open connections
Next Up
Part 2 will continue to focus on GET as follows:
- The “get “Request: “params”
- The “get “Request: “allow_redirects”
- The “get “Request: “auth”
- The “get “Request: “cert” and “verify”
- The “get “Request: “`cookies “