Iterators, Iterables and Itertools

Iterables and iterators are everywhere in Python. We usually aren’t aware of the iterators because the syntax of python hides them from us. Almost every time we manipulate a sequence type (strings, lists, tuples, sets, arrays, etc.), we’re using an iterator behind the scenes.

An iterable represents a sequence of values each of which is returned one at a time when the corresponding iterator is invoked.

>>> import sys; sys.version
'3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]'

ThisĀ articleĀ explainsĀ theĀ iteratorĀ protocolĀ toĀ deepenĀ understandingĀ ofĀ theĀ basicsĀ andĀ presentsĀ someĀ ofĀ theĀ mostĀ usefulĀ toolsĀ inĀ theĀ itertoolsĀ module whichĀ canĀ beĀ helpfulĀ whenĀ theĀ basicsĀ aren’tĀ enoughĀ toĀ getĀ theĀ jobĀ done.Ā Ā Also,Ā we’llĀ examineĀ whyĀ iteratorsĀ canĀ beĀ muchĀ moreĀ efficientĀ thanĀ standard containers.

What are iterables and iterators?

The list [1, 2, 3] is an iterable. We can get its elements one at a time using the for-in construct.

l = list([1, 2, 3])
for i in l:
    print(i)

Output:

1
2
3

Now let’s expose what’s going on inside. First, let’s look at the methods that l provides (the dir function lists the methods of an object).

>>> dir(l)
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

The significant method for our current purposes is __iter__. This is what makes l an interable. The __iter__ returns an iterator. Let’s get our hands on an iterator and explore it.

>>> l.__iter__()
<list_iterator at 0x2b0659d6088>
>>> type(l.__iter__())
list_iterator

Another way to get at the iterator for an iterable is the iter function. As you can see, it is just a more succinct way to retrieve the iterator.

>>> my_iterator = iter(l); my_iterator
<list_iterator at 0x2b0659dc688>
>>> my_iterator = iter(l); my_iterator
<list_iterator at 0x2b0659dcac8>

Note: there’s a subtlety here: each time __iter__ or iter is called, a new iterator instance is returned. Each can be called separately. Each one of these is independent and operating with one has no effect on the other(s). This is important for concurrency when multiple processes need to independently operate on the iterable. For now, we can put this aside and look at what we can do with the iterator object.

>>> dir(my_iterator)
['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__length_hint__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

The significant method for our current purposes is __next__. This is what makes the object an iterator. The __next__ method returns the next value from the iterable when called.

>>> my_iterator.__next__()
1
>>> my_iterator.__next__()
2

The builtin function next() does the same thing as calling the __next__ method (similar to iter and .__iter__).

>>> next(my_iterator)
3

Now it’s interesting to see what happens when next() is called again. At this point, we’ve reached the end of the values.

>>> next(my_iterator)
---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

<ipython-input-13-26be35a80dc3> in <module>
----> 1 next(my_iterator)

StopIteration: 

As you can see, the iterator raises the StopIteration exception (and will continue to do so if called again). This signals that there are no more values left (we say that the iterator is exhausted).

And now you can see what for-in does behind the scenes. The actual code does something like the following:

done = False
it = iter(l)
while not done:
    try:
        print(next(it))
    except StopIteration:
        done = True

Output:

1
2
3

Building iterators

Now let’s build our own iterator that does something a little different to demonstrate how to build your own and also see how the pieces above come together.

This one takes an iterable and a step size, n (and optional offset), and will return every nth element.

class nth_elems():
    def __init__(self, contents, stride, start=0):
        self.contents = contents
        self.stride = stride
        self.start = start
        self.pointer = self.start
    def __iter__(self):
        return self
    def __next__(self):
        if self.pointer < len(self.contents):
            value = self.contents[self.pointer]
            self.pointer += self.stride
            return value
        else:
            raise StopIteration 

thing = nth_elems(range(10), 3)
print(thing)
# <__main__.nth_elems at 0x2b0659e5088>

print(type(thing))
# __main__.nth_elems

print(dir(thing))
'''
['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'contents',
 'pointer',
 'start',
 'stride']
'''

for t in thing:
    print(t)
'''
0
3
6
9
'''

What’s going on here? We wrap an interable (range(10) in the iterator we’ve just built and the for loop takes care of driving our iterator (with next()) and takes care of catching the StopIteration when we’ve reached the end).

You can argue that the same thing could be done by the for loop and you’d be correct but the start argument adds a functionality that isn’t available in the for loop.

thing = nth_elems(range(10), 3, start=2)
for t in thing:
    print(t)
'''
2
5
8
'''

So iterators can be used to build custom behaviors which can be better suited to the problem at hand. And in the usual fashion, python provides a module that adds functionality to the base language and allows you to reuse useful iteration patterns without having to build them yourself.

Interlude: Why iterators

Iterators and lists or tuples both represent collections of items that can be accessed one at a time and can be consumed or processed with for loops and similar constructs. Why use iterators at all?

The reason is simple: lists consume memory for every item in the list. An iterator can retrieve or construct each item as needed and because of that only requires enough memory to store one item.

Let’s look at an example so we can see exactly what that can mean.

>>> import sys
>>> sys.getsizeof(list(range(1000000)))
9000112
>>> sys.getsizeof(range(1000000))
48

So if you are accessing a data structure one element at a time, it can pay huge dividends in both memory and performance to implement an iterator for the object.

The itertools module

The itertools module is a collection of useful iteration patterns and includes 3 basic kinds of iterators: infinite iterators, finite iterators and combinatoric iterators. We give examples of each type below.

infinite iterators

The infinite iterators will keep yielding values until you stop calling for them. They’re great for marking other iterables in some useful way.

>>> from itertools import count
>>> count()
count(0)
>>> list(zip('beluga', count()))
[('b', 0), ('e', 1), ('l', 2), ('u', 3), ('g', 4), ('a', 5)]
>>> from itertools import cycle
>>> list(zip('beluga', cycle([1, 2, 3])))
[('b', 1), ('e', 2), ('l', 3), ('u', 1), ('g', 2), ('a', 3)]
>>> from itertools import repeat
>>> list(zip('beluga', repeat([1, 2, 3])))
[('b', [1, 2, 3]),
 ('e', [1, 2, 3]),
 ('l', [1, 2, 3]),
 ('u', [1, 2, 3]),
 ('g', [1, 2, 3]),
 ('a', [1, 2, 3])]

Finite iterators

Finite iterators are exhausted when their inputs are used up. There are about a dozen of these. Here are a few examples to whet your appetite:

Starmap

This one has the coolest name. It takes a function and an iterable and applies the function to the elements. The number of members of each element should correspond to the number of arguments to the function.

from math import sqrt
from itertools import starmap

discriminants = [x for x in starmap(lambda a, b, c: sqrt(b**2 - 4*a*c), 
                                    [(1, -2 , 1), (1, 4, 4)])]
print(discriminants)
# [0.0, 0.0]

Chain

Chain allows multiple iterators to be treated as a single sequence.

from itertools import chain
for c in chain('separate', 'words'):
    print(c)
'''
s
e
p
a
r
a
t
e
w
o
r
d
s
'''

Accumulate

Accumulate captures the all the intermediate results of applying a function of two arguments succesively to each element of the input interable and the result thus far.

This allows us to capture running totals. You can use user-defined functions, lambda functions or import operators to use efficient implementations of python’s built-in operators with function syntax.

# factorial
from itertools import accumulate
import operator 
list(accumulate(range(1, 10), operator.mul))
# [1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
# operator.add is the default function
# running total
from itertools import accumulate
list(accumulate(range(1, 10)))
# [1, 3, 6, 10, 15, 21, 28, 36, 45]

Combinatorial iterators

The combinatorial iterators are extremely handy when you need to use a group of combinations of items.

>>> from itertools import product, permutations, combinations, combinations_with_replacement

Product

Product produces the same result as a nested for loop.

>>> list(product('abc', 'def'))
[('a', 'd'),
 ('a', 'e'),
 ('a', 'f'),
 ('b', 'd'),
 ('b', 'e'),
 ('b', 'f'),
 ('c', 'd'),
 ('c', 'e'),
 ('c', 'f')]

Permutations

Permutations returns all possible unique selections of length n from the input iterable.

>>> list(permutations(['red', 'green', 'blue'], 2))
[('red', 'green'),
 ('red', 'blue'),
 ('green', 'red'),
 ('green', 'blue'),
 ('blue', 'red'),
 ('blue', 'green')]

Combinations

Combinations returns all possible unique selections of length n from the input iterable ignoring order (i.e only one of [('red', green), ('green', 'red')]).

>>> list(combinations(['red', 'green', 'blue'], 2))
[('red', 'green'), ('red', 'blue'), ('green', 'blue')]

Combinations

Combinations returns all possible unique selections of length n from the input iterable ignoring order but allowing multiple choices of the same selection..

>>> list(combinations_with_replacement(['red', 'green', 'blue'], 2))
[('red', 'red'),
 ('red', 'green'),
 ('red', 'blue'),
 ('green', 'green'),
 ('green', 'blue'),
 ('blue', 'blue')]

Closing remarks

The documentation for the itertools ends with a group of recipes that use itertools functions along with standard python to produce a wide range of iteration patterns. When faced with an iteration challenge, it’s a great idea to check if there’s one applicable to the problem at hand.

In addition, there is another module, more_itertools that implements the recipes in the itertools documentation and many more useful patterns. We end with a few examples that should provide motivation to explore this wonderful module.

>>> from more_itertools import flatten, pairwise, grouper

Flatten

Flatten removes one level of nesting from a list of lists

>>> list(flatten([['a', 'b'], [1, 2]]))
['a', 'b', 1, 2]

Pairwise

This handy function returns all successive pairs of elements.

>>> list(pairwise(['red', 'orange', 'green', 'blue']))
[('red', 'orange'), ('orange', 'green'), ('green', 'blue')]

Grouper

This function breaks the input into chunks of the size argument.

>>> list(grouper(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'], 3))
[('red', 'orange', 'yellow'),
 ('green', 'blue', 'indigo'),
 ('violet', None, None)]