5 Best Ways to Read or Write Binary Data in Python

πŸ’‘ Problem Formulation: When working with binary files in Pythonβ€”such as image or audio filesβ€”you may need to directly read from or write binary data. This article will guide you through various methods to handle binary files, using Python’s built-in capabilities to provide versatility in how you approach binary data manipulation. Whether you’re dealing with an ‘.img’ file to extract raw pixel data or an ‘.mp3’ to process audio, having the knowledge to efficiently read and write this data is crucial.

Method 1: Using the built-in open() function

Python’s built-in open() function with the ‘rb’ or ‘wb’ mode is the standard way to read or write binary data. ‘rb’ stands for ‘read binary’, and ‘wb’ stands for ‘write binary’. This method is efficient and straightforward for dealing with binary files. It is suitable for both small and large files.

Here’s an example:

with open('example.bin', 'rb') as binary_file:
    data = binary_file.read()

with open('output.bin', 'wb') as binary_file:
    binary_file.write(data)

Output: A new file ‘output.bin’ with the content copied from ‘example.bin’.

This example reads the entire content of ‘example.bin’ as binary data and then writes that data into a new file named ‘output.bin’. The usage of ‘with’ ensures that the file is properly closed after its suite finishes.

Method 2: Using memoryview for buffer protocol

Memoryview objects allow Python code to access the internal data of an object that supports the buffer protocol without copying. This is particularly useful for large binary files where you want to manipulate data directly in memory. It’s a more advanced technique that can yield performance benefits.

Here’s an example:

with open('example.bin', 'rb') as binary_file:
    buffer = memoryview(binary_file.read())

# Display the first 10 bytes
print(buffer[:10])

# Supposing we manipulate buffer, now write it back
with open('modified.bin', 'wb') as binary_file:
    binary_file.write(buffer)

Output: The first 10 bytes of data from ‘example.bin’.

This code snippet demonstrates how to read binary data into a memoryview for access or manipulation. The first 10 bytes are displayed for demonstration purposes. Then the (potentially manipulated) buffer is written back to a new file, ‘modified.bin’.

Method 3: Using the struct module for binary data

The struct module in Python is used to convert between Python values and C structs represented as Python bytes objects. This is particularly useful for handling binary data with known, structured formats. It can unpack data to Python objects and pack Python objects into binary data.

Here’s an example:

import struct

# Simulate binary data for a point struct having two integers representing coordinates x, y
binary_data = struct.pack('ii', 10, 20)

# Write binary data
with open('coordinates.bin', 'wb') as file:
    file.write(binary_data)

# Read the binary data back
with open('coordinates.bin', 'rb') as file:
    x, y = struct.unpack('ii', file.read())

print(f"Coordinates: x = {x}, y = {y}")

Output: Coordinates: x = 10, y = 20

The snippet creates a binary representation of a structure containing two integers (simulating coordinates) using struct.pack(), writes it to a file, and reads it back, converting it to a Python tuple containing two integers using struct.unpack().

Method 4: Using array module for homogeneous data

The array module can be used to create compact arrays of basic values: characters, integers, floating point numbers, which are stored as binary data. This method is a memory-efficient way for reading and writing arrays of uniform type data.

Here’s an example:

from array import array

numbers = array('d', [1.1, 2.2, 3.3])

# Write array to file as binary
with open('numbers.bin', 'wb') as file:
    numbers.tofile(file)

# Read array from binary file
new_numbers = array('d')
with open('numbers.bin', 'rb') as file:
    new_numbers.fromfile(file, 3)

print(new_numbers)

Output: array(‘d’, [1.1, 2.2, 3.3])

In this example, we create an array of doubles (floating-point numbers) and write it to a binary file using array.tofile(). We then read from the file and load the data into another array using array.fromfile(). The result is a memory-efficient representation of numerical data.

Bonus One-Liner Method 5: Using List Comprehension with bytes or bytearray

When working with binary data that’s small enough to be held comfortably in memory, list comprehension alongside bytes (immutable) or bytearray (mutable) objects provides a succinct way to process binary data.

Here’s an example:

data = bytearray([0x10, 0x20, 0x30])

# One-liner to invert all bits in a bytearray using list comprehension
inverted_data = bytearray([~b & 0xFF for b in data])
print(inverted_data)

Output: bytearray(b’\xef\xdf\xcf’)

This code inverts all the bits in a bytearray of binary data by using a list comprehension that applies a bitwise NOT operation to every element. The 0xFF is used to ensure the result stays within one byte.

Summary/Discussion

  • Method 1: Built-in open function. Suitable for a wide range of use cases. Easy to use but handles data in memory when reading entire files, which could be inefficient with large files.
  • Method 2: Memoryview. Allows for efficient manipulation of large binary files. More complex usage but can provide performance benefits for large data.
  • Method 3: Struct module. Perfect for structured binary data. Requires knowledge of the data structure but gives precise control over the representation of binary data.
  • Method 4: Array module. Ideal for homogeneous data types and efficient memory usage. Limited to numerical data and not as general-purpose.
  • Bonus Method 5: List comprehension with bytes/bytearray. Offers a quick and flexible way to process binary data in memory but not suitable for very large data sets.