5 Best Ways to Read and Write TAR Archive Files Using Python tarfile

πŸ’‘ Problem Formulation: When working with TAR archive files in Python, developers often need tools to create, extract, or manipulate these archives efficiently. Input typically involves file paths or file-like objects, with desired outputs being the successful reading or writing to TAR archives. This article will demonstrate how to perform these actions using the Python tarfile module.

Method 1: Creating a TAR Archive

To create a TAR archive, you can use the tarfile.open() method in “w” write mode along with add() to include files. This method can handle regular files and directories, preserving file permissions and metadata.

Here’s an example:

import tarfile

with tarfile.open('example.tar', 'w') as tar:
    tar.add('sample.txt')

Output: A new file example.tar is created containing the sample.txt file.

This code snippet creates a TAR archive named ‘example.tar’ and adds ‘sample.txt’ into this archive. Opening the archive using the ‘with’ statement ensures that the file is properly closed after its block is executed, preventing any resource leaks.

Method 2: Extracting a TAR Archive

Extracting a TAR archive is straightforward using the extractall() method, which extracts all contents into the current directory or into a specified path. It takes care of creating directories and files as they appear in the archive.

Here’s an example:

import tarfile

with tarfile.open('example.tar', 'r') as tar:
    tar.extractall(path='output_directory/')

Output: Files from example.tar are extracted into the ‘output_directory/’.

In this example, we open ‘example.tar’ for reading and extract all contained files to ‘output_directory/’. The same context manager pattern as before safely manages the archive’s lifecycle.

Method 3: Adding Multiple Files to an Archive

The add() method also accepts a list of files. By iterating over a collection of file paths, you can add multiple files to your TAR archive sequentially. This is useful for bundling several files at once.

Here’s an example:

import tarfile

files_to_archive = ['file1.txt', 'file2.txt', 'file3.jpg']
with tarfile.open('bundle.tar', 'w') as tar:
    for file in files_to_archive:
        tar.add(file)

Output: A TAR file bundle.tar containing ‘file1.txt’, ‘file2.txt’, and ‘file3.jpg’.

This snippet creates a TAR archive that includes several files specified in the list `files_to_archive`. It showcases how to bundle multiple disparate files into a single archive.

Method 4: Reading Individual Files in a TAR Archive

Individual files within a TAR archive can be read using the extractfile() method. This is useful for when you need access to a specific file’s content without extracting the entire archive.

Here’s an example:

import tarfile

with tarfile.open('bundle.tar', 'r') as tar:
    f = tar.extractfile('file1.txt')
    if f is not None:
        content = f.read()
        print(content)

Output: The contents of the file file1.txt are printed.

The code reads the contents of ‘file1.txt’ from within ‘bundle.tar’ without extracting it to disk. This operation is particularly useful for archives with numerous or large files when you only need data from a single file.

Bonus One-Liner Method 5: Creating an Archive from a Directory

Python’s tarfile can quickly archive an entire directory using a one-liner. This uses the make_tarfile() function from the tarfile module’s high-level interface.

Here’s an example:

import tarfile

tarfile.make_tarfile('directory_archive.tar', 'my_directory/')

Output: An archive directory_archive.tar containing all files from ‘my_directory/’.

This concise command archives the entire contents of ‘my_directory/’ into a TAR file ‘directory_archive.tar’. It’s an efficient way to bundle directories without writing manual loops.

Summary/Discussion

  • Method 1: Creating a TAR Archive. Straightforward. Limited to one file or directory at a time.
  • Method 2: Extracting a TAR Archive. Simple to implement. Extracts all files, which can be inefficient if you need only a few.
  • Method 3: Adding Multiple Files to an Archive. Flexible. Requires iterating through files, which can be verbose for large sets.
  • Method 4: Reading Individual Files in a TAR Archive. Reads without extracting. Can become unwieldy if the archive’s structure is unknown or complex.
  • Bonus Method 5: Creating an Archive from a Directory. Extremely convenient. Only applicable for directories, not individual file selections.