Python bytes vs bytearray - Be on the Right Side of Change

What’s the Difference Between bytes() and bytearray()?

The difference between bytes() and bytearray() is that bytes() returns an immutable and bytearray() returns a mutable object. So you can modify a bytearray but not bytes type.

Here’s a minimal example that nicely demonstrates the difference of the two functions:

>>> a = bytes(3)
>>> b = bytearray(3)
>>> a
b'\x00\x00\x00'
>>> b
bytearray(b'\x00\x00\x00')
>>> a[0] = 1
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    a[0] = 1
TypeError: 'bytes' object does not support item assignment
>>> b[0] = 1
>>> b
bytearray(b'\x01\x00\x00')

You create two variables a and b. The former is a bytes and the latter a bytearray—both encoding the same data.

However, if you try to modify the bytes object, Python raises a TypeError: 'bytes' object does not support item assignment.

You resolve the error by using a bytearray object instead — if you modify it using item assignment such as b[0] = 1, the error disappears.

The reason is that unlike bytes, the bytearray type is mutable.

🌍 Recommended Tutorial: Python Mutable vs Immutable Objects

Let’s recap both functions quickly. Afterward, we will do a performance comparison in case you wondered which is faster!

Python bytes()

Python’s built-in bytes(source) function creates an immutable bytes object initialized as defined in the function argument source.

A bytes object is like a string but it uses only byte characters consisting of a sequence of 8-bit integers in the range 0<=x<256.

The returned byte object is immutable—you cannot change it after creation. If you plan to change the contents, use the bytearray() method to create a mutable bytearray object.

# Single Integer Input Argument
print(bytes())
print(bytes(2))
print(bytes(4))

'''
b''
b'\x00\x00'
b'\x00\x00\x00\x00'
'''

🌍 Recommended Tutorial: Python bytes() Built-in Function

Python bytearray()

Python’s built-in bytearray() function takes an iterable such as a list of integers between 0 and 256, converts them to bytes between 00000000 and 11111111, and returns a new array of bytes as a bytearray class.

# Single Integer Input Argument
print(bytearray())
print(bytearray(2))
print(bytearray(4))

'''
bytearray(b'')
bytearray(b'\x00\x00')
bytearray(b'\x00\x00\x00\x00')
'''

🌍 Recommended Tutorial: Python bytearray() Built-in Function

Performance bytes() vs bytearray()

🛑 Attention: My initial performance measurement is flawed. Keep reading to see ASL97’s comments and corrected performance evaluation!

Creating a large bytes object (e.g., 100 million bytes) using bytes() is faster than creating a bytearray with bytearray() because the mutability of the latter comes at a performance cost.

In our experiment creating objects of 100 million bytes, we have seen a performance speedup of 3x using bytes() compared to bytearray().

Here’s the code of the simple experiment setup:

import time


n = 100000000

start = time.time()
b = bytes(b'x' * n)
stop = time.time()
print('bytes()', stop - start)


start = time.time()
b = bytearray(b'x' * n)
stop = time.time()
print('bytearray()', stop - start)

I ran the experiment multiple times on my Win Intel Core i7 CPU with 8GB of RAM and obtained the following output:

= RESTART:
bytes() 0.02437591552734375
bytearray() 0.07250618934631348
>>> 
= RESTART:
bytes() 0.015362262725830078
bytearray() 0.0646059513092041
>>> 
= RESTART:
bytes() 0.017394542694091797
bytearray() 0.05153799057006836
>>> 
= RESTART: 
bytes() 0.013019084930419922
bytearray() 0.0436251163482666
>>> 
= RESTART: 
bytes() 0.011996746063232422
bytearray() 0.042777299880981445

So the difference is roughly 3-4x in performance: bytes() is 300-400% faster than bytearray().

However, don’t let this tiny performance edge fool you! 👇

Edit: User ASL97 submitted another variation of this test where the opposite happened:

~ $ cat test2.py
import time
n = 100000000
start = time.time()
a = b'x' * n
stop = time.time()
print('a = b\'x\' * n', stop - start)

start = time.time()
b = bytes(a)
stop = time.time()
print('bytes(a)', stop - start)
start = time.time()
b = bytearray(b'x') * n
stop = time.time()
print('bytearray(b\'x\') * n', stop - start)
start = time.time()
b = bytearray(a)
stop = time.time()
print('bytearray(a)', stop - start)

~ $ python test2.py
a = b'x' * n 0.03690695762634277
bytes(a) 4.76837158203125e-06
bytearray(b'x') * n 0.03336071968078613
bytearray(a) 0.04361891746520996

ASL97 correctly hinted that “To avoid run to run variation, it is advisable to increase n by a factor of 10, bytearray is faster when done correctly even with the initial overhead of converting bytes to bytearray, b'x' * n vs bytearray(b'x') * n.”

Also, I would consider ASL97’s performance test to be superior due to his feedback:

The provided test on that page is not measuring what it seem to be suggesting. For bytes, it is measuring the time to create the byte object and converting bytes to bytes which is basically a do nothing. For bytearray, it is measuring the time to create the same byte object and converting bytes to bytearray

Thanks for the feedback! ♥️

In most cases, it’s best to optimize for readability and suitability for the problem:

Do you need to change the data structure holding bytes after creation? Use the mutable bytearray().
Do you create the data structure once and then only read its information? Use the immutable bytes().

Don’t do premature optimization!

👉 Recommended Tutorial: Premature Optimization is the Root of All Evil

Where to Go From Here?

Thanks for reading the whole tutorial—I hope you got some valuable pieces of knowledge out of it.

If you want to keep improving your coding skills and become a Python pro, check out our free email academy. We have cheat sheets too!