Python Small Integer Caching: "==" versus "is"

This interesting code snippet was brought to my attention by Finxter reader Albrecht.

a, b = 250, 250
for i in range(250, 260):
    if a is not b:
        break
    a += 1
    b += 1
print(a)
# What's the output of this code snippet?

You’d guess that the for loop goes from i=250 to i=259, each time incrementing a and b. As Python creates one integer object to which both names refer, the command a is not b should always be False. Thus, the result is a=259, right?

WRONG!!! $%&&%$

The result is a=257.

The reason is an implementation detail of the CPython implementation called “Small Integer Caching” — the internal cache of integers in Python.

If you create an integer object that falls into the range of -5 to 256, Python will only return a reference to this object — which is already cached in memory.

“The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.”

Python Docs

Let’s quickly examine the meaning of “is” in Python.

The is operator

The is operator checks if two variable names point to the same object in memory:

>>> a = "hello"
>>> b = "hello"
>>> a is b
True

Both variables a and b point to the string "hello". Python doesn’t store the same string twice but creates it only once in memory. This saves memory and makes Python faster and more efficient. And it’s not a problem because strings are immutable — so one variable cannot “overshadow” a string object of another variable.

Note that we can use the id() function to check an integer representation of the memory address:

>>> a = "hello"
>>> b = "hello"
>>> id(a)
1505840752992
>>> id(b)
1505840752992

They both point to the same location in memory! Therefore, the is operator returns True!

Small Integer Caching

Again, if you create an integer object that falls into the range of -5 to 256, Python will only return a reference to this object — which is already cached in memory. But if we create an integer object that does not fall into this range, Python may return a new integer object with the same value.

If we now check a is not b, Python will give us the correct result True.

In fact, this leads to the strange behavior of the C implementation of Python 3:

>>> a = 256
>>> b = 256
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False

Therefore, you should always compare integers by using the == operator in Python. This ensures that Python performs a semantic comparison, and not a mere memory address comparison:

>>> a = 256
>>> b = 256
>>> a == b
True
>>> a = 257
>>> b = 257
>>> a == b
True

What can you learn from this? Implementation details matter!