Python requests.get() – The Ultimate Guide

Rate this post

Syntax

requests.get(url, args)

You can replace args with one or more of the following arguments, comma-separated:

ParameterDescription
url RequiredThe URL of the request
params OptionalSend data using a URL query string. Dictionary, list of tuples, or bytes.
allow_redirects Optional By default, True: allowing redirects. If False, the code prevents redirection to another website or another web page on the same site.
auth Optional Often referred to as Basic Authentication. By default, this value is None no authentication required. The format is a tuple with two elements.
cert Optional By default, cert equals Enabled: checks for a valid SSL certificate. If the SSL certificate is invalid, an SSLError will occur.
cookies Optional Dictionary of cookies sent to a specified URL. By default, the value is None no cookies sent.
headers Optional By default, this value is None. If True, a dictionary of HTTPS headers transfers to the specified URL. 
proxies Optional If you are an avid WebScraper or need to keep your online presence hidden, using proxies is the answer. Proxies hide your IP address from the outside world.
stream Optional By default, this value is False. If False, a response transfers indicating that the file should download immediately. If True, stream the file.
timeout Optional This method allows the coder to set how long the code will wait before timing out.
verify Optional Boolean or string to verify the server’s TLS certificate. Default is True.

Preparation

Before any requests can occur, one (1) new library will require installation.

  • The Requests library allows access to its many methods and makes data manipulation a breeze!

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install requests

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import requests

Background

The coder can connect, access, and perform various data manipulation tasks by using this library. There are many libraries around that make HTTP requests. However, the requests library seems to be the most popular.

When the requests library sends a URL, the following occurs:

  • A DNS lookup converts the URL to an IP address (example: 312.245.123.21),.the request library sends a request to this IP address,
  • The server attempts to validate this request.
  • The server returns a status code as shown below.

πŸ’‘ Note: The URL https://books.toscrape.com used for some examples in this article welcomes coders and encourages scraping.


Status Codes

Direct quote from Wikipedia:

HTTP response status codes separate into five classes or categories. The first digit of the status code defines the class of response. The last two digits do not have any classifying or categorization role. These five classes are:

1XXInformational ResponseThe request was received, continuing process.
2XXSuccessThe request was successfully received, understood & accepted.
3XXRedirectionFurther action is needed to complete the request.
4XXClient ErrorThe requests contain invalid syntax or incomplete data.
5XXServer ErrorThe server failed to fulfill a valid request.

Making a Request

This method uses the GET Request to connect to a website. This function takes a URL as an argument. In this example, a status code returns and displays the status of the connection (success/failure). If invalid, the script abruptly ends.

Run this script. If successful, a status code starting with 2XX outputs to the terminal.

response = requests.get('https://books.toscrape.com')
print(response.status_code)
response.close()
  • Line [1] attempts to connect to the URL.
  • Line [2] outputs the status code.
  • Line [3] closes the open connection.

OR

print(requests.codes.ok)
response.close()

Output

200
200

As mentioned above, if your status code is other than 200, there is a good chance the script will fail. To prevent this, wrap the code in a try/except statement.

try:
    response = requests.get('https://books.toscrape.com')
    print('OK')
    response.close()
except:
    print('Error')
  • Line [1] initializes the try statement. The code inside here will run first.
    • Line [2] performs a GET request to connect to the URL.
    • Line [3] if successful, OK is output to the terminal.
    • Line [4] closes the open connection.
  • Line [5] is the except statement. If the try statement fails, the code falls to here.
    • Line [6] outputs the message Error to the terminal. The script terminates.

Response Content

When the code shown below runs, the HTML code on the requested web page is output to the terminal.

try:
    response = requests.get('https://books.toscrape.com')
    print(response.text)
    response.close()
except:
    print('Error')
  • Line [1] initializes the try statement. The code inside here will run first.
    • Line [2] performs a GET request to connect to the URL.
    • Line [3] if successful, OK is output to the terminal.
    • Line [4] closes the open connection.
  • Line [5] is the except statement. If the try statement fails, the code falls to here.
    • Line [6] outputs Error to the terminal. The script terminates.

Output

A small portion of the HTML code displays below.

<article class="product_pod">
<div class="image_container">
<a href="catalogue/the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html"><img src="media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg" alt="The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics" class="thumbnail"></a>
</div>
...

Parameters

get.request() “timeout”

This method allows the coder to set how long the code will wait before timing out for:

  • a connection
  • a response

In the example below, the connection time equals 2 seconds. The response time equals 4 seconds.

The best practice is to add the timeout parameter to every request made.

πŸ’‘ Note: If not entered, the code can hang up to two minutes before crashing. Browser-dependent.

try:
    response = requests.get('https://books.toscrape.com', timeout=(2, 4))
    print(response.text)
    response.close()
except:
    print('Error')
  • Line [1] initializes the try statement. The code inside here will run first.
    • Line [2] performs a GET request to connect to the URL and sets a timeout.
    • Line [3] if the response is successful, the HTML code from the URL outputs to the terminal.
    • Line [4] closes the open connection.
  • Line [5] is the except statement. If the try statement fails, the code falls to here.
    • Line [6] outputs Error to the terminal. The script automatically terminates.

Output

See above.


get.request() “params”

At some point, you may need to send data using a URL query string. If the query is hard-coded, the format would be similar to below.

Example: https://somewebsite.com?key1=val&key2=val

πŸ’‘ Note: The first argument contains a question mark (?) to signify a single value. If passing more than one, use the ampersand (&) between additional values.

The requests library allows you to easily pass these arguments as one of the following data types:

For this example, the test website httpbin is the URL.

key_vals  = {'key1': 'value1', 'key2': 'value2'}
response = requests.get('https://httpbin.org/get', params=key_vals)
print(response.url)
response.close()
  • Line [1] imports the requests library.
  • Line [2] assigns two key:value pairs to a dictionary.
  • Line [3] attempts to connect to the URL and the key_vals dictionary to params.
  • Line [4] outputs the URL with the contents of key_vals appended.
  • Line [5] closes the open connection.

Output

https://httpbin.org/get?key1=value1&key2=value2

get.request() “allow_redirects”

This method is not required and can be True or False. By default, this value is True: allowing redirects. If False, the code prevents redirection to another website or another web page on the same site.

response = requests.get('https://app.finxter.com', allow_redirects=False)
print(response.status_code)
response.close()
  • Line [1] attempts to connect to the URL and sets allow_redirects to False.
  • Line [2] outputs the response code to the terminal.
  • Line [3] closes the open connection.

Output

302

get.request() “auth”

Often referred to as Basic Authentication, this is one of the simplest methods. This option is not required. By default, this value is None: no authentication required. The format is a tuple with two elements.

response = requests.get('https://www.facebook.com/', auth=('username', 'password'))
print(response.status_code)
response.close()
  • Line [1] attempts to connect to the website and sets auth to a username and password.
  • Line [2] outputs the response code to the terminal.
  • Line [3] closes the open connection.

Output

200

For additional authentication methods, click here.


get.request() “cert” and “verify”

This method requires a valid SSL certificate. This certificate is used for HTTPS requests.

The SSL certificate is a small file that connects the specified certificate to a company’s details. A website with an SSL certificate is assumed to be secure. By default, cert equals Enabled: checks for a valid SSL certificate. If the SSL certificate is invalid, an SSLError will occur.

response = requests.get('https://somesite.com', cert='certs/my_cert.cert')
print(response.status_code)
response.close()
  • Line [1] attempts to connect to the URL and sets cert to the location and filename of the SSL certificate.
  • Line [2] outputs the response code to the terminal.
  • Line [3] closes the open connection.

Output

200

If unsuccessful, an error code outputs to the terminal displaying the details. Perhaps the SSL certificate was not set up or improperly set up. To get around this, use verify and set to False.

response = requests.get('https://somesite.com', cert='certs/my_cert.cert', verify=False)
print(response.status_code)
response.close()

Output

For this example, the `successful` status code is returned. However, we did get a warning about validation.

<Response [200]>
    ...
    Unverified HTTPS request is being made to host 'somesite.com'. Adding certificate verification is strongly advised. 
    ...

get.request() “cookies”

This method is not required and is a dictionary of cookies sent to a specified URL. By default, the value is None: no cookies sent.

This example uses the test website httpbin and issues a custom cookie to a URL.

my_cookies  = dict(cookies_are='working')
response = requests.get('http://httpbin.org/cookies', cookies=cookies)
print(response.text)
response.close()
  • Line [1] creates a cookie.
  • Line [2] passes a URL and sets cookies to my_cookies.
  • Line [3] outputs the contents to the terminal.
  • Line [4] closes the open connection.

Output

{
  "cookies": {
    "cookies_are": "working"
  }
}

get.request() “headers”

This method is not required. By default, this value is None. If True, a dictionary of HTTPS headers transfers to the specified URL. 

When an HTTP request initiates, a User-Agent string transfers along with the request. This string contains the following details of your system:

  • The application type.
  • The operating system.
  • The software vendor.
  • The software version of the requesting User-Agent.

That server uses these details to determine the capability of your computer.

This code will send its header information to a server.

hdrs = {
    "Connection": "keep-alive",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, 
    like Gecko) Chrome/72.0.3626.121 Safari/537.36"}

response = requests.get('https://app.finxter.com', headers=hdrs)
print(response.headers)
response.close()
  • Line [1] saves a well-formed User-Agent string to the hdrs variable.
  • Line [2] attempts to connect to the URL and sets headers to hdrs.
  • Line [3] outputs the header response to the terminal.
  • Line [4] closes the open connection

Output

{'Server': 'nginx/1.14.0 (Ubuntu)', 'Date': 'Fri, 05 Nov 2021 16:59:19 GMT', 'Content-Type': 'text/html; charset=utf-8',
'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Frame-Options': 'DENY', 'Vary': 'Cookie', 'X-Content-Type-Options': 'nosniff', 'Set-Cookie': 'sessionid=0fb6y6y5d8xoxacstf74ppvacpmt2tin; expires=Fri, 19 Nov 2021 16:59:19 GMT; HttpOnly; Max-Age=19600; Path=/; SameSite=Lax', 'Content-Encoding': 'gzip'}

πŸ’‘ Note: This is a great Python feature. If you are interested in Web Scraping, you may want to delve further into this topic.


get.request() “proxies”

If you are an avid WebScraper or need to keep your online presence hidden, using proxies is the answer. Proxies hide your IP address from the outside world.

There are several free/paid proxy services where a list of IP addresses is available and updated daily.

πŸ’‘ Note: The Finxter Academy does not guarantee any IP addresses. You will need to source your own.

For this example, we get a new IP address from a free proxy service and add it to a dictionary.

the_url  = 'https://somewebsite.com'
my_proxy = {"https": "https:157.245.222.225:3128"}
response = requests.get(the_url, proxies=my_proxy)
print(response.status_code)
response.close()
  • Line [1] sets a URL to the_url variable.
  • Line [2] adds one fresh proxy as of this writing in the form of a dictionary.
  • Line [3] attempts to connect to the URL and sets proxies to my_proxy.
  • Line [4] outputs the status code response to the terminal.
  • Line [5] closes the open connection.

Output

200

get.request() “stream”

This method is not required. By default, this value is False. If False, a response transfers indicating that the file should download immediately. If True, stream the file.

response = requests.get('https://app.finxter.com/static/favicon_coffee.png', stream=True)
print(response.status_code)
response.close()
  • Line [1] set the URL to the logo location and set stream to True.
  • Line [2] outputs the status code response to the terminal.
  • Line [3] closes the open connection.

Output

200

Exception Handling

There is a large number of exceptions associated with the requests library. To view a detailed list, click here.   

There are two ways to approach handling this situation:

Individually

For this example, we have added a timeout to requests.get(). If the connection or server times out, an exception will occur.

try:
   response = requests.get('https://app.finxter.com', timeout=(2,4))
   print(response.status_code)
   response.close()
except requests.ConnectTimeout():
    print('Timed Out!')
  • Line [1] initializes the try statement. The code inside here will run first.
    • Line [2] attempts to connect to the URL and sets a timeout.
    • Line [3] outputs the status code to the terminal.
    • Line [4] closes the open connection.
  • Line [5] is the except statement. If a timeout occurs, the code falls to here.
    • Line [6] outputs the message Timed Out! to the terminal. The script terminates.

Output

200

All Exceptions

All exceptions from the requests library inherit from requests.exceptions.RequestException. For this example, this code captures all exceptions.

try:
   response = requests.get('https://app.finxter.com', timeout=(2,4))
   print(response.status_code)
   response.close()
except requests.exceptions.RequestException as e:
    print(e)
  • Line [1] initializes the try statement. The code inside here will run first.
    • Line [2] attempts to connect to the URL and sets a timeout.
    • Line [3] outputs the status code to the terminal.
    • Line [4] closes the open connection.
  • Line [5] is the except statement. If any exception occurs, the code falls here.
    • Line [6] outputs the exception message (e) to the terminal. The script terminates.

Output

200

You could also convert the above into a reusable function. Modify this code to meet your requirements.

def error_code(url):
    try:
        response = requests.get('https://app.finxter.c', timeout=(2,4))
    except requests.exceptions.RequestException as e:
        return e
       
nok = error_code('https://app.finxter.com')
print(nok)

Summary

In this article, we learned how to:

  • Connect to a URL
  • Retrieve and display status codes
  • Output the HTML code to the terminal
  • Use the try/except statement to catch errors
  • Set a timeout
  • Close any open connections
  • Send data via a URL
  • Allow or prevent redirects
  • Use authentication
  • Use an SSL certificate and verify the same
  • Use cookies
  • Use headers
  • Use proxies
  • Use a stream
  • Implement Exception Handling