5 Best Ways to Find Out When a Download Has Completed Using Python and Selenium

Rate this post

πŸ’‘ Problem Formulation: In automated testing or web scraping tasks using Python and Selenium, it’s often necessary to download files from the web. The challenge arises in knowing exactly when a file download has completed so that subsequent actions can be taken. As an input, we’d interact with a webpage to trigger a download and, as a desired output, we aim to programmatically detect the download completion before proceeding to the next steps.

Method 1: Polling the Download Directory

Periodically checking or polling the download directory can be an effective method to detect when a download has been completed. This technique involves repeatedly checking the target folder for the downloaded file at set intervals, looking for a file without a ‘.part’ or ‘.tmp’ extension, which are used by browsers for incomplete downloads. This method relies on the fact that when a file finishes downloading, the temporary extension is removed.

Here’s an example:

import os
import time

download_dir = "/path/to/download"
file_name = "my_expected_file_name"

while not os.path.exists(os.path.join(download_dir, file_name)):
    time.sleep(1)

print("Download completed!")

Output:

Download completed!

This snippet checks for the presence of a file called “my_expected_file_name” in the specified “download_dir” directory. We’re using a while loop to keep checking until the file appears, with a one-second pause between each check to avoid using too much CPU. When the file is detected, the loop breaks, and it prints “Download completed!”.

Method 2: Checking Download Status via Browser Logs

Selenium can interact with browser logs where information about network activity, including file downloads, is kept. By configuring the WebDriver to log this information, we can parse it and check for signals that indicate a download has completed. This method takes advantage of the internal reporting mechanism of the web browser.

Here’s an example:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

# Enable browser logging
capabilities = DesiredCapabilities.CHROME
capabilities['loggingPrefs'] = { 'performance':'ALL' }
driver = webdriver.Chrome(desired_capabilities=capabilities)

# Trigger download...

# Check the browser logs for download status
logs = driver.get_log('performance')
for entry in logs:
    if 'my_expected_file_name' in str(entry) and 'Network.responseReceived' in str(entry):
        print('Download detected in logs.')
        break

driver.quit()

Output:

Download detected in logs.

The code first sets up the Chrome WebDriver to capture performance logs, then triggers a file download. After that, it iterates through the logs checking for entries that contain both the expected file name and the ‘Network.responseReceived’ event, which indicates that a response for a network request (like a file download) was received. It’s assumed the download started before iterating the logs.

Method 3: Observing File Size Changes

Another way to check if a file download has completed is by observing changes in its size over time. This method assumes the file has been created and is visible in the file system, growing in size as it is being downloaded. Once the file size remains constant over a few observation cycles, we can infer that the download has finished.

Here’s an example:

import os
import time

download_dir = "/path/to/download"
file_name = "my_expected_file"
file_path = os.path.join(download_dir, file_name)

start_size = -1
while True:
    file_size = os.path.getsize(file_path) if os.path.exists(file_path) else 0
    if file_size > start_size:
        start_size = file_size
        time.sleep(2)
    else:
        break

print("Download completed!")

Output:

Download completed!

This snippet periodically checks the size of the expected file in the given directory. It uses an infinite loop that breaks once the file size stops changing, indicating that the download has completed. An initial two-second delay ensures that the loop doesn’t exit prematurely before the download starts.

Method 4: Using WebDriver Wait to Monitor File System

Combining Selenium’s wait conditions with file system operations can be a smart way to check for download completion. By using a WebDriver wait condition, you create a blocking call that only returns once a certain condition is met, such as the existence of a fully downloaded file without any browser-specific temporary extensions.

Here’s an example:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import os

driver = webdriver.Chrome()
# Trigger download...

download_dir = "/path/to/download"
file_name = "my_expected_file"

def check_download():
    return os.path.exists(os.path.join(download_dir, file_name))

WebDriverWait(driver, 60).until(check_download)
print("Download completed!")

driver.quit()

Output:

Download completed!

After initializing the Chrome WebDriver and starting the download, the example defines a function check_download() that determines if the expected file exists. The WebDriverWait() function uses this conditional function to wait up to 60 seconds before timing out. Once the file is detected, the message “Download completed!” is printed and the WebDriver quits.

Bonus One-Liner Method 5: Using Python’s Watchdog Library

The Watchdog library is a Python API library that monitors file system events. You can set a watchdog observer to monitor the download directory and notify you when a download has completed. The advantage of this approach is that it’s event-driven and does not rely on polling.

Here’s an example:

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import time

class DownloadHandler(FileSystemEventHandler):
    def on_created(self, event):
        if event.src_path.endswith("my_expected_file"):
            print("Download completed!")

download_dir = "/path/to/download"
event_handler = DownloadHandler()
observer = Observer()
observer.schedule(event_handler, download_dir, recursive=False)
observer.start()

try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    observer.stop()

observer.join()

Output:

Download completed!

This code snippet uses the Watchdog library to monitor for file creation events in the specified download directory. The DownloadHandler class is a custom file event handler that reacts to the ‘created’ event and checks if the file created is the expected file. If it is, it prints “Download completed!”. It should be noted this example waits indefinitely until interrupted manually.

Summary/Discussion

  • Method 1: Polling the Download Directory. Strengths: Easy to implement. Weaknesses: Inefficient, as it requires continuous checking.
  • Method 2: Checking Download Status via Browser Logs. Strengths: Leverages built-in browser functionality. Weaknesses: May require processing a large amount of log data.
  • Method 3: Observing File Size Changes. Strengths: Clearly indicates when a file has stopped downloading. Weaknesses: Relies on visible file system changes and may not be immediate.
  • Method 4: Using WebDriver Wait to Monitor File System. Strengths: Efficient use of Selenium’s built-in wait functionality. Weaknesses: Depends on correct implementation of wait condition.
  • Method 5: Using Python’s Watchdog Library. Strengths: Uses filesystem events, reducing unnecessary polling. Weaknesses: Requires third-party package installation and potentially complex setup.