5 Best Ways to Use Chrome WebDriver in Selenium to Download Files in Python

πŸ’‘ Problem Formulation: Automating file downloads within a web browser can be critical for tasks such as data scraping or testing file download features in web applications. This article explains how to accomplish file downloads using the Chrome WebDriver in Selenium with Python. For instance, let’s assume we want to download a PDF report from a web page automatically without manual intervention.

Method 1: Setting Chrome Options

Chrome WebDriver can be customized using ChromeOptions to specify the default download directory and disable the download prompt. By adding preferences to ChromeOptions, we can control the behavior of the file download process.

Here’s an example:

from selenium import webdriver

options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : '/path/to/directory', 'download.prompt_for_download': False}
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=options)
driver.get('http://example.com/some-file.pdf')

The file is downloaded to the specified directory automatically.

This code creates an instance of Chrome WebDriver with specified preferences that tell the browser to download files to a given directory without asking for a confirmation. It is useful for automation scripts where no user interaction is desired.

Method 2: Handling MIME Types

Another way to ensure that files are downloaded without interruptions is to define the MIME types. This tells Chrome to automatically download files of specified MIME types instead of trying to open them first.

Here’s an example:

from selenium import webdriver

mime_types = "application/pdf,application/vnd.ms-excel"
options = webdriver.ChromeOptions()
options.add_experimental_option("prefs", {
    "download.default_directory": "/path/to/directory",
    "download.prompt_for_download": False,
    "plugins.always_open_pdf_externally": True, # It will not show PDF directly in chrome
    "profile.default_content_settings.popups": 0,
    "download.extensions_to_open": "",
    "plugins.plugins_disabled": ["Chrome PDF Viewer"],
    "profile.content_settings.exceptions.automatic_downloads.*.setting": 1,
    "profile.managed_default_content_settings.images": 2
})
driver = webdriver.Chrome(chrome_options=options)
driver.get('http://example.com/some-sheet.xls')

The XLS file is downloaded automatically.

This code snippet ensures that files with ‘.pdf’ and ‘.xls’ extensions are downloaded directly without any prompt. The MIME type configuration is especially useful when dealing with several known file types.

Method 3: Custom Download Handlers

For more control over the download process, a custom download handler can be implemented using Python’s built-in modules, which allows handling file download streams.

Here’s an example:

import urllib.request
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('http://example.com/some-file.zip')
download_link = driver.find_element_by_xpath("//a[@href='/download-url']")
url = download_link.get_attribute('href')
urllib.request.urlretrieve(url, "/path/to/some-file.zip")

This would download the ZIP file directly from its URL, bypassing the WebDriver.

In this method, the actual download is handled by urllib instead of the WebDriver. After locating the download link using Selenium, the URL is handed off to urllib to manage the file download.

Method 4: Integration with Python Requests

For a more direct and scriptable file download approach, you can integrate Selenium WebDriver with the Requests library to handle downloads.

Here’s an example:

import requests
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('http://example.com')
download_link = driver.find_element_by_xpath("//a[@href='/some-file.pdf']")
url = download_link.get_attribute('href')
r = requests.get(url, allow_redirects=True)
open('/path/to/file.pdf', 'wb').write(r.content)

The PDF file is saved to the desired location.

This snippet uses Selenium to navigate and obtain the file’s URL and then employs the Requests library to download the file. This is a powerful combination as it uses Requests’ robust capabilities to manage file downloads.

Bonus One-Liner Method 5: Direct Download via URL

If the direct URL of the file is known, using Python’s Requests library alone can be the quickest way to download a file.

Here’s an example:

import requests
requests.get('http://example.com/some-file.pdf', allow_redirects=True).content

The file’s contents are retrieved as a byte stream.

This one-liner directly requests the file from the given URL. This method is ideal when dealing with direct file links and when setup complexity of Selenium is not warranted.

Summary/Discussion

  • Method 1: Setting Chrome Options. Strengths: Customizable and easy to set up. Weaknesses: Relies heavily on Chrome’s settings, so less flexibility outside pre-defined options.
  • Method 2: Handling MIME Types. Strengths: Very effective for handling known file types. Weaknesses: Requires updating code when dealing with new MIME types.
  • Method 3: Custom Download Handlers. Strengths: More control over the download process. Weaknesses: More complex setup and requires additional code for error handling.
  • Method 4: Integration with Python Requests. Strengths: Leverages the power of Requests for downloading files. Weaknesses: May require handling cookies or session management manually.
  • Method 5: Direct Download via URL. Strengths: Simplest method with minimal code. Weaknesses: Limited to scenarios where the file URL is already known.