Syntax
requests.get(url, args)
You can replace args
with one or more of the following arguments, comma-separated:
Parameter | Description | |
---|---|---|
url | Required | The URL of the request |
params | Optional | Send data using a URL query string. Dictionary, list of tuples, or bytes. |
allow_redirects | Optional | By default, True : allowing redirects. If False , the code prevents redirection to another website or another web page on the same site. |
auth | Optional | Often referred to as Basic Authentication. By default, this value is None no authentication required. The format is a tuple with two elements. |
cert | Optional | By default, cert equals Enabled: checks for a valid SSL certificate. If the SSL certificate is invalid, an SSLError will occur. |
cookies | Optional | Dictionary of cookies sent to a specified URL. By default, the value is None no cookies sent. |
headers | Optional | By default, this value is None . If True , a dictionary of HTTPS headers transfers to the specified URL. |
proxies | Optional | If you are an avid WebScraper or need to keep your online presence hidden, using proxies is the answer. Proxies hide your IP address from the outside world. |
stream | Optional | By default, this value is False . If False , a response transfers indicating that the file should download immediately. If True , stream the file. |
timeout | Optional | This method allows the coder to set how long the code will wait before timing out. |
verify | Optional | Boolean or string to verify the server’s TLS certificate. Default is True . |
Preparation
Before any requests can occur, one (1) new library will require installation.
- The Requests library allows access to its many methods and makes data manipulation a breeze!
To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.
$ pip install requests
Hit the <Enter> key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import requests
Background
The coder can connect, access, and perform various data manipulation tasks by using this library. There are many libraries around that make HTTP requests. However, the requests library seems to be the most popular.
When the requests library sends a URL, the following occurs:
- A DNS lookup converts the URL to an IP address (example:
312.245.123.21
),.therequest
library sends a request to this IP address, - The server attempts to validate this request.
- The server returns a status code as shown below.
π‘ Note: The URL https://books.toscrape.com used for some examples in this article welcomes coders and encourages scraping.
Status Codes
Direct quote from Wikipedia:
HTTP response status codes separate into five classes or categories. The first digit of the status code defines the class of response. The last two digits do not have any classifying or categorization role. These five classes are:
1XX | Informational Response | The request was received, continuing process. |
2XX | Success | The request was successfully received, understood & accepted. |
3XX | Redirection | Further action is needed to complete the request. |
4XX | Client Error | The requests contain invalid syntax or incomplete data. |
5XX | Server Error | The server failed to fulfill a valid request. |
Making a Request
This method uses the GET Request to connect to a website. This function takes a URL as an argument. In this example, a status code returns and displays the status of the connection (success/failure). If invalid, the script abruptly ends.
Run this script. If successful, a status code starting with 2XX outputs to the terminal.
response = requests.get('https://books.toscrape.com') print(response.status_code) response.close()
- Line [1] attempts to connect to the URL.
- Line [2] outputs the status code.
- Line [3] closes the open connection.
OR
print(requests.codes.ok) response.close()
Output
200 200
As mentioned above, if your status code is other than 200, there is a good chance the script will fail. To prevent this, wrap the code in a try/except statement.
try: response = requests.get('https://books.toscrape.com') print('OK') response.close() except: print('Error')
- Line [1] initializes the try statement. The code inside here will run first.
- Line [2] performs a GET request to connect to the URL.
- Line [3] if successful, OK is output to the terminal.
- Line [4] closes the open connection.
- Line [5] is the except statement. If the try statement fails, the code falls to here.
- Line [6] outputs the message Error to the terminal. The script terminates.
Response Content
When the code shown below runs, the HTML code on the requested web page is output to the terminal.
try: response = requests.get('https://books.toscrape.com') print(response.text) response.close() except: print('Error')
- Line [1] initializes the try statement. The code inside here will run first.
- Line [2] performs a GET request to connect to the URL.
- Line [3] if successful, OK is output to the terminal.
- Line [4] closes the open connection.
- Line [5] is the except statement. If the try statement fails, the code falls to here.
- Line [6] outputs Error to the terminal. The script terminates.
Output
A small portion of the HTML code displays below.
<article class="product_pod"> <div class="image_container"> <a href="catalogue/the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html"><img src="media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg" alt="The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics" class="thumbnail"></a> </div> ...
Parameters
get.request() “timeout”
This method allows the coder to set how long the code will wait before timing out for:
- a connection
- a response
In the example below, the connection time equals 2 seconds. The response time equals 4 seconds.
The best practice is to add the timeout parameter to every request made.
π‘ Note: If not entered, the code can hang up to two minutes before crashing. Browser-dependent.
try: response = requests.get('https://books.toscrape.com', timeout=(2, 4)) print(response.text) response.close() except: print('Error')
- Line [1] initializes the try statement. The code inside here will run first.
- Line [2] performs a GET request to connect to the URL and sets a timeout.
- Line [3] if the response is successful, the HTML code from the URL outputs to the terminal.
- Line [4] closes the open connection.
- Line [5] is the except statement. If the try statement fails, the code falls to here.
- Line [6] outputs Error to the terminal. The script automatically terminates.
Output
See above.
get.request() “params”
At some point, you may need to send data using a URL query string. If the query is hard-coded, the format would be similar to below.
Example: https://somewebsite.com?key1=val&key2=val
π‘ Note: The first argument contains a question mark (?) to signify a single value. If passing more than one, use the ampersand (&) between additional values.
The requests library allows you to easily pass these arguments as one of the following data types:
- a dictionary (used in this example)
- a list of tuples, or
- Bytes
For this example, the test website httpbin is the URL.
key_vals = {'key1': 'value1', 'key2': 'value2'} response = requests.get('https://httpbin.org/get', params=key_vals) print(response.url) response.close()
- Line [1] imports the
requests
library. - Line [2] assigns two key:value pairs to a dictionary.
- Line [3] attempts to connect to the URL and the
key_vals
dictionary toparams
. - Line [4] outputs the URL with the contents of
key_vals
appended. - Line [5] closes the open connection.
Output
https://httpbin.org/get?key1=value1&key2=value2
get.request() “allow_redirects”
This method is not required and can be True or False. By default, this value is True: allowing redirects. If False, the code prevents redirection to another website or another web page on the same site.
response = requests.get('https://app.finxter.com', allow_redirects=False) print(response.status_code) response.close()
- Line [1] attempts to connect to the URL and sets
allow_redirects
toFalse
. - Line [2] outputs the response code to the terminal.
- Line [3] closes the open connection.
Output
302
get.request() “auth”
Often referred to as Basic Authentication, this is one of the simplest methods. This option is not required. By default, this value is None
: no authentication required. The format is a tuple with two elements.
response = requests.get('https://www.facebook.com/', auth=('username', 'password')) print(response.status_code) response.close()
- Line [1] attempts to connect to the website and sets
auth
to a username and password. - Line [2] outputs the response code to the terminal.
- Line [3] closes the open connection.
Output
200
For additional authentication methods, click here.
get.request() “cert” and “verify”
This method requires a valid SSL certificate. This certificate is used for HTTPS requests.
The SSL certificate is a small file that connects the specified certificate to a companyβs details. A website with an SSL certificate is assumed to be secure. By default, cert
equals Enabled: checks for a valid SSL certificate. If the SSL certificate is invalid, an SSLError will occur.
response = requests.get('https://somesite.com', cert='certs/my_cert.cert') print(response.status_code) response.close()
- Line [1] attempts to connect to the URL and sets cert to the location and filename of the SSL certificate.
- Line [2] outputs the response code to the terminal.
- Line [3] closes the open connection.
Output
200
If unsuccessful, an error code outputs to the terminal displaying the details. Perhaps the SSL certificate was not set up or improperly set up. To get around this, use verify and set to False
.
response = requests.get('https://somesite.com', cert='certs/my_cert.cert', verify=False) print(response.status_code) response.close()
Output
For this example, the `successful` status code is returned. However, we did get a warning about validation.
<Response [200]> ... Unverified HTTPS request is being made to host 'somesite.com'. Adding certificate verification is strongly advised. ...
get.request() “cookies”
This method is not required and is a dictionary of cookies sent to a specified URL. By default, the value is None
: no cookies sent.
This example uses the test website httpbin and issues a custom cookie to a URL.
my_cookies = dict(cookies_are='working') response = requests.get('http://httpbin.org/cookies', cookies=cookies) print(response.text) response.close()
- Line [1] creates a cookie.
- Line [2] passes a URL and sets cookies to my_cookies.
- Line [3] outputs the contents to the terminal.
- Line [4] closes the open connection.
Output
{ "cookies": { "cookies_are": "working" } }
get.request() “headers”
This method is not required. By default, this value is None
. If True
, a dictionary of HTTPS headers transfers to the specified URL.
When an HTTP request initiates, a User-Agent string transfers along with the request. This string contains the following details of your system:
- The application type.
- The operating system.
- The software vendor.
- The software version of the requesting User-Agent.
That server uses these details to determine the capability of your computer.
This code will send its header information to a server.
hdrs = { "Connection": "keep-alive", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"} response = requests.get('https://app.finxter.com', headers=hdrs) print(response.headers) response.close()
- Line [1] saves a well-formed User-Agent string to the hdrs variable.
- Line [2] attempts to connect to the URL and sets headers to hdrs.
- Line [3] outputs the header response to the terminal.
- Line [4] closes the open connection
Output
{'Server': 'nginx/1.14.0 (Ubuntu)', 'Date': 'Fri, 05 Nov 2021 16:59:19 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Frame-Options': 'DENY', 'Vary': 'Cookie', 'X-Content-Type-Options': 'nosniff', 'Set-Cookie': 'sessionid=0fb6y6y5d8xoxacstf74ppvacpmt2tin; expires=Fri, 19 Nov 2021 16:59:19 GMT; HttpOnly; Max-Age=19600; Path=/; SameSite=Lax', 'Content-Encoding': 'gzip'}
π‘ Note: This is a great Python feature. If you are interested in Web Scraping, you may want to delve further into this topic.
get.request() “proxies”
If you are an avid WebScraper or need to keep your online presence hidden, using proxies is the answer. Proxies hide your IP address from the outside world.
There are several free/paid proxy services where a list of IP addresses is available and updated daily.
π‘ Note: The Finxter Academy does not guarantee any IP addresses. You will need to source your own.
For this example, we get a new IP address from a free proxy service and add it to a dictionary.
the_url = 'https://somewebsite.com' my_proxy = {"https": "https:157.245.222.225:3128"} response = requests.get(the_url, proxies=my_proxy) print(response.status_code) response.close()
- Line [1] sets a URL to the_url variable.
- Line [2] adds one fresh proxy as of this writing in the form of a dictionary.
- Line [3] attempts to connect to the URL and sets proxies to my_proxy.
- Line [4] outputs the status code response to the terminal.
- Line [5] closes the open connection.
Output
200
get.request() “stream”
This method is not required. By default, this value is False
. If False
, a response transfers indicating that the file should download immediately. If True
, stream the file.
response = requests.get('https://app.finxter.com/static/favicon_coffee.png', stream=True) print(response.status_code) response.close()
- Line [1] set the URL to the logo location and set stream to True.
- Line [2] outputs the status code response to the terminal.
- Line [3] closes the open connection.
Output
200
Exception Handling
There is a large number of exceptions associated with the request
s
library. To view a detailed list, click here.
There are two ways to approach handling this situation:
Individually
For this example, we have added a timeout to requests.get()
. If the connection or server times out, an exception will occur.
try: response = requests.get('https://app.finxter.com', timeout=(2,4)) print(response.status_code) response.close() except requests.ConnectTimeout(): print('Timed Out!')
- Line [1] initializes the try statement. The code inside here will run first.
- Line [2] attempts to connect to the URL and sets a timeout.
- Line [3] outputs the status code to the terminal.
- Line [4] closes the open connection.
- Line [5] is the except statement. If a timeout occurs, the code falls to here.
- Line [6] outputs the message Timed Out! to the terminal. The script terminates.
Output
200
All Exceptions
All exceptions from the requests library inherit from requests.exceptions.RequestException
. For this example, this code captures all exceptions.
try: response = requests.get('https://app.finxter.com', timeout=(2,4)) print(response.status_code) response.close() except requests.exceptions.RequestException as e: print(e)
- Line [1] initializes the try statement. The code inside here will run first.
- Line [2] attempts to connect to the URL and sets a timeout.
- Line [3] outputs the status code to the terminal.
- Line [4] closes the open connection.
- Line [5] is the except statement. If any exception occurs, the code falls here.
- Line [6] outputs the exception message (e) to the terminal. The script terminates.
Output
200
You could also convert the above into a reusable function. Modify this code to meet your requirements.
def error_code(url): try: response = requests.get('https://app.finxter.c', timeout=(2,4)) except requests.exceptions.RequestException as e: return e nok = error_code('https://app.finxter.com') print(nok)
Summary
In this article, we learned how to:
- Connect to a URL
- Retrieve and display status codes
- Output the HTML code to the terminal
- Use the try/except statement to catch errors
- Set a timeout
- Close any open connections
- Send data via a URL
- Allow or prevent redirects
- Use authentication
- Use an SSL certificate and verify the same
- Use cookies
- Use headers
- Use proxies
- Use a stream
- Implement Exception Handling