Read a Binary File into a Byte Array in Python

As a software developer, I had a requirement to read a binary file into a byte array in Python while working on a project for a client in New York. After researching and experimenting with different methods, I discovered several ways to accomplish this task efficiently. In this article, I will share my findings and provide detailed examples.

Table of Contents

Binary Files in Python

Before getting into the code, let’s understand what binary files are. Unlike text files, which contain human-readable characters, binary files store data in a machine-readable format. Each byte in a binary file represents a value ranging from 0 to 255. Binary files can contain any type of data, including text, numbers, or custom data structures.

Read How to Create Arrays in Python

Open a Binary File

To read a binary file in Python, we need to open it in binary mode using the 'rb' flag. The 'r' stands for read mode, and 'b' indicates binary mode. Here’s an example:

with open("example.bin", 'rb') as file:
    # Perform operations on the binary file

In this example, we use the with statement to open the binary file named "example.bin". The with statement ensures that the file is properly closed after we’re done with it, even if an exception occurs.

Check out How to Find the Maximum Value in Python Using the max() Function

Read Binary Data into a Byte Array in Python

Once we have the binary file open, we can read its contents into a byte array. Python provides a byte array type, that is mutable and efficient for handling binary data

Here’s an example of reading the entire binary file into a byte array:

with open("example.bin", 'rb') as file:
    byte_array = bytearray(file.read())

Output:

Hello, this is an example binary file.

You can see the executed example code in the below screenshot.

Read a Binary File into a Byte Array in Python

In this code, we use the read() method to read the entire contents of the binary file. The bytearray() constructor then creates a byte array from the binary data.

Read How to Find the Index of an Element in an Array in Python

Read Binary Data in Chunks

Sometimes, binary files can be quite large, and reading the entire file into memory might not be feasible. In such cases, we can read the binary data in chunks. Here’s an example:

chunk_size = 1024  # Read in chunks of 1024 bytes
byte_array = bytearray()

with open("large_file.bin", 'rb') as file:
    while True:
        chunk = file.read(chunk_size)
        if not chunk:
            break
        byte_array.extend(chunk)

Output:

This is a test binary file.

You can see the executed example code in the below screenshot.

How to Read a Binary File into a Byte Array in Python

In this example, we define a chunk_size of 1024 bytes. We initialize an empty byte array using bytearray(). Inside the with block, we use a while loop to read the binary file in chunks. The read() method takes the chunk_size as an argument and returns a chunk of binary data. If the chunk is empty, it means we’ve reached the end of the file, and we break out of the loop. Otherwise, we extend the byte_array with the current chunk using the extend() method.

Read How to Check if an Array is Empty in Python

Access and Manipulating the Byte Array in Python

Once we have the binary data in a byte array, we can access and manipulate individual bytes using indexing. Here’s an example:

byte_array = bytearray(b'\x00\x01\x02\x03')

print(byte_array[0])  
print(byte_array[1])  

byte_array[2] = 255
print(byte_array)  # Output: bytearray(b'\x00\x01\xff\x03')

Output:

0
1
bytearray(b'\x00\x01\xff\x03')

You can see the executed example code in the below screenshot.

Read a Binary File into a Byte Array in Python Access and Manipulating the Byte Array

In this example, we create a byte array with four bytes using a byte string literal. We can access individual bytes using indexing, just like with lists. We can also modify the values of specific bytes by assigning new values to the corresponding indices.

Check out How to Check the Length of an Array in Python

Interpret Binary Data

Binary files often contain structured data that needs to be interpreted correctly. Python provides the struct module to unpack binary data into meaningful values. Here’s an example:

import struct

with open("data.bin", 'rb') as file:
    # Read a 32-bit integer and a 64-bit float
    binary_data = file.read(12)  # 4 bytes for integer + 8 bytes for float
    value1, value2 = struct.unpack('>if', binary_data)
    print(f"Value 1: {value1}, Value 2: {value2}")

In this example, we assume that the binary file “data.bin” contains a 32-bit integer followed by a 64-bit float. We read 12 bytes (4 bytes for the integer and 8 bytes for the float) into the binary_data variable. Then, we use the struct.unpack() function to unpack the binary data into the corresponding values based on the format string '>if'. The > indicates big-endian byte order, 'i' represents a 32-bit integer and 'f' represents a 64-bit float.

Check out How to Create a 2D Array in Python

Examples

Let’s consider a real-world example to illustrate the usefulness of reading binary files into byte arrays. Suppose you are working on a project that involves analyzing satellite imagery data. The satellite data is stored in a custom binary format, where each pixel is represented by a 16-bit unsigned integer. Here’s how you can read and process the binary data:

width = 1280
height = 720

with open("satellite_data.bin", 'rb') as file:
    byte_array = bytearray(file.read())

pixel_data = []
for i in range(0, len(byte_array), 2):
    pixel = (byte_array[i] << 8) | byte_array[i+1]
    pixel_data.append(pixel)

# Process the pixel data
for i, pixel in enumerate(pixel_data):
    if pixel > 30000:
        print(f"High-intensity pixel found at index {i}")

In this example, we assume that the satellite data is stored in a file named “satellite_data.bin”. We read the entire binary file into a byte array. Since each pixel is represented by a 16-bit unsigned integer (2 bytes), we iterate over the byte array in step 2. For each pair of bytes, we combine them to form the pixel value using bitwise operations. We store the pixel values in a list called pixel_data.

Finally, we process the pixel_data by iterating over each pixel value. In this example, we check if the pixel value exceeds a threshold of 30000, indicating a high-intensity pixel. We print the index of such pixels for further analysis.

Read How to Initialize a 2D Array in Python

Conclusion

In this tutorial, we explored how to read binary files into byte arrays using Python. We covered the basics of binary files, opening files in binary mode, reading binary data into byte arrays, eading binary data into byte in chunks and interpreting binary data using the struct module. We also looked at a real-world example of processing satellite imagery data stored in a binary format.