5 Best Ways to Use Boto3 Library in Python to Upload an Object to S3 Using AWS Resource

Rate this post

πŸ’‘ Problem Formulation: When working with AWS S3, a common task is to upload files or objects to an S3 bucket. In Python, this is typically accomplished using the Boto3 library, which provides an interface to Amazon Web Services, including S3. The goal is to understand how to set up Boto3 and execute an upload operation, wherein the input is a file on your local system, and the desired output is that file securely stored in an S3 bucket.

Method 1: Using Boto3 S3 Resource With Upload_file Method

One of the simplest ways to upload a file to an S3 bucket using the Boto3 library in Python is by using the S3 Resource and its upload_file method. This method requires the file path, bucket name, and the object name as parameters. It handles multipart uploads automatically for large files and is considered a high-level utility.

Here’s an example:

import boto3

# Initialize S3 Resource
s3_resource = boto3.resource('s3')

# Upload a file
s3_resource.Bucket('my-bucket').upload_file('path/to/myfile.txt', 'myfile.txt')

Output: The file myfile.txt is uploaded to the my-bucket S3 bucket.

This code initializes an S3 resource object and then calls the upload_file method on the specified bucket object, providing the local filename and the target file name in the bucket. It’s straightforward and abstracts away the complexities of file uploads.

Method 2: Using Boto3 Client With Put_object Method

For those who need more control over the upload process, the Boto3 Client interface offers the put_object method. It allows setting extra attributes such as metadata or server-side encryption. It can also be useful for uploading file-like objects that do not reside on the disk.

Here’s an example:

import boto3

# Initialize S3 Client
s3_client = boto3.client('s3')

# Upload a file
with open('path/to/myfile.txt', 'rb') as data:
    s3_client.put_object(Bucket='my-bucket', Key='myfile.txt', Body=data)

Output: The file myfile.txt is uploaded to the my-bucket S3 bucket with the specified parameters.

The code snippet opens the file in binary read mode and then performs the upload using the put_object method on the client. It gives more control over the upload parameters and allows for additional configurations if needed.

Method 3: Uploading A File In Chunks With Multipart Upload

Multipart upload is a way to upload large files in chunks, making it more reliable and potentially faster. By uploading parts in parallel, network issues affecting one part will not necessitate restarting the entire upload. Boto3 handles the multipart upload process behind the scenes when using the upload_file method.

Here’s an example:

import boto3
from boto3.s3.transfer import TransferConfig

# Configuration for multipart upload (5 MB chunks)
config = TransferConfig(multipart_threshold=1024 * 5, multipart_chunksize=1024 * 5)

# Initialize S3 Resource
s3_resource = boto3.resource('s3')

# Multipart upload
s3_resource.meta.client.upload_file('path/to/largefile.mp4', 'my-bucket', 'largefile.mp4', Config=config)

Output: The large file largefile.mp4 is uploaded in chunks to the my-bucket S3 bucket.

This approach sets a TransferConfig with a specific chunk size for multipart uploads and then performs the upload with this configuration. The process is automatic and beneficial for large file uploads.

Method 4: Uploading Files Asynchronously

Asynchronous file upload can help in utilizing your bandwidth and CPU by uploading multiple files at the same time or by continuing with the next operations without waiting for the upload to finish. concurrent.futures module in Python can help achieve this.

Here’s an example:

import boto3
import concurrent.futures

def upload_to_s3(bucket, object_name, file_path):
    s3_client = boto3.client('s3')
    try:
        s3_client.upload_file(file_path, bucket, object_name)
    except Exception as e:
        print(e)
    
# List of files to upload
files = [('my-bucket', 'file1.txt', 'path/to/file1.txt'), ('my-bucket', 'file2.txt', 'path/to/file2.txt')]

# Asynchronously upload files
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(upload_to_s3, *file) for file in files]
    for future in concurrent.futures.as_completed(futures):
        pass

Output: Multiple files are uploaded concurrently to the my-bucket S3 bucket.

This code defines an upload function that takes in bucket, object name, and file path arguments to use with a ThreadPoolExecutor for uploading several files in parallel. This method is most effective when dealing with multiple file uploads.

Bonus One-Liner Method 5: Streamlined Upload Using Lambda

A single line of code can perform an upload using a lambda function, which is ideal for simple and concise applications like serverless AWS Lambda functions where the code length may be at a premium.

Here’s an example:

import boto3

# Upload file in one line using a lambda
(lambda f: boto3.client('s3').upload_file(f, 'my-bucket', f))(path/to/myfile.txt)

Output: The file myfile.txt is uploaded to the my-bucket S3 bucket.

This is a concise, though not necessarily clearer, way to upload a single file. The lambda function takes a file path as an argument, initializes a client, and performs the upload. This might be less readable but is efficient if minimalism is required.

Summary/Discussion

  • Method 1: Using S3 Resource With Upload_file Method. Simplifies the upload process. Great for straightforward single file uploads. It abstracts multipart handling but lacks control over advanced parameters.
  • Method 2: Using Boto3 Client With Put_object Method. Offers more control and is best for developer’s that need detailed configurations like metadata or server-side encryption.
  • Method 3: Uploading A File In Chunks With Multipart Upload. Best for large files as it’s more efficient and robust. Parallel uploads of chunks may improve overall upload time.
  • Method 4: Uploading Files Asynchronously. Ideal for multiple file uploads or non-blocking operations. Utilizes threads for concurrent upload tasks, improving performance on multicore systems.
  • Bonus Method 5: Streamlined Upload Using Lambda. Quick and efficient for single file uploads, especially where code brevity is essential. However, this method sacrifices readability and may be difficult to maintain.