Iterables and iterators are everywhere in Python. We usually aren’t aware of the iterators because the syntax of python hides them from us. Almost every time we manipulate a sequence type (strings, lists, tuples, sets, arrays, etc.), we’re using an iterator behind the scenes.
An iterable represents a sequence of values each of which is returned one at a time when the corresponding iterator is invoked.
>>> import sys; sys.version '3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]'
This article explains the iterator protocol to deepen understanding of the basics and presents some of the most useful tools in the
itertools module which can be helpful when the basics aren’t enough to get the job done. Also, we’ll examine why iterators can be much more efficient than standard containers.
What are iterables and iterators?
[1, 2, 3] is an iterable. We can get its elements one at a time using the for-in construct.
l = list([1, 2, 3]) for i in l: print(i)
1 2 3
Now let’s expose what’s going on inside. First, let’s look at the methods that l provides (the dir function lists the methods of an object).
>>> dir(l) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
The significant method for our current purposes is
__iter__. This is what makes l an interable. The
__iter__ returns an iterator. Let’s get our hands on an iterator and explore it.
>>> l.__iter__() <list_iterator at 0x2b0659d6088> >>> type(l.__iter__()) list_iterator
Another way to get at the iterator for an iterable is the
iter function. As you can see, it is just a more succinct way to retrieve the iterator.
>>> my_iterator = iter(l); my_iterator <list_iterator at 0x2b0659dc688> >>> my_iterator = iter(l); my_iterator <list_iterator at 0x2b0659dcac8>
Note: there’s a subtlety here: each time
iter is called, a new iterator instance is returned. Each can be called separately. Each one of these is independent and operating with one has no effect on the other(s). This is important for concurrency when multiple processes need to independently operate on the iterable. For now, we can put this aside and look at what we can do with the iterator object.
>>> dir(my_iterator) ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']
The significant method for our current purposes is
__next__. This is what makes the object an iterator. The
__next__ method returns the next value from the iterable when called.
>>> my_iterator.__next__() 1 >>> my_iterator.__next__() 2
The builtin function
next() does the same thing as calling the
__next__ method (similar to
>>> next(my_iterator) 3
Now it’s interesting to see what happens when
next() is called again. At this point, we’ve reached the end of the values.
>>> next(my_iterator) --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-13-26be35a80dc3> in <module> ----> 1 next(my_iterator) StopIteration:
As you can see, the iterator raises the
StopIteration exception (and will continue to do so if called again). This signals that there are no more values left (we say that the iterator is exhausted).
And now you can see what
for-in does behind the scenes. The actual code does something like the following:
done = False it = iter(l) while not done: try: print(next(it)) except StopIteration: done = True
1 2 3
Now let’s build our own iterator that does something a little different to demonstrate how to build your own and also see how the pieces above come together.
This one takes an iterable and a step size, n (and optional offset), and will return every nth element.
class nth_elems(): def __init__(self, contents, stride, start=0): self.contents = contents self.stride = stride self.start = start self.pointer = self.start def __iter__(self): return self def __next__(self): if self.pointer < len(self.contents): value = self.contents[self.pointer] self.pointer += self.stride return value else: raise StopIteration thing = nth_elems(range(10), 3) print(thing) # <__main__.nth_elems at 0x2b0659e5088> print(type(thing)) # __main__.nth_elems print(dir(thing)) ''' ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'contents', 'pointer', 'start', 'stride'] ''' for t in thing: print(t) ''' 0 3 6 9 '''
What’s going on here? We wrap an interable (
range(10) in the iterator we’ve just built and the for loop takes care of driving our iterator (with
next()) and takes care of catching the StopIteration when we’ve reached the end).
You can argue that the same thing could be done by the for loop and you’d be correct but the start argument adds a functionality that isn’t available in the for loop.
thing = nth_elems(range(10), 3, start=2) for t in thing: print(t) ''' 2 5 8 '''
So iterators can be used to build custom behaviors which can be better suited to the problem at hand. And in the usual fashion, python provides a module that adds functionality to the base language and allows you to reuse useful iteration patterns without having to build them yourself.
Interlude: Why iterators
Iterators and lists or tuples both represent collections of items that can be accessed one at a time and can be consumed or processed with for loops and similar constructs. Why use iterators at all?
The reason is simple: lists consume memory for every item in the list. An iterator can retrieve or construct each item as needed and because of that only requires enough memory to store one item.
Let’s look at an example so we can see exactly what that can mean.
>>> import sys >>> sys.getsizeof(list(range(1000000))) 9000112 >>> sys.getsizeof(range(1000000)) 48
So if you are accessing a data structure one element at a time, it can pay huge dividends in both memory and performance to implement an iterator for the object.
The itertools module
The itertools module is a collection of useful iteration patterns and includes 3 basic kinds of iterators: infinite iterators, finite iterators and combinatoric iterators. We give examples of each type below.
The infinite iterators will keep yielding values until you stop calling for them. They’re great for marking other iterables in some useful way.
>>> from itertools import count >>> count() count(0) >>> list(zip('beluga', count())) [('b', 0), ('e', 1), ('l', 2), ('u', 3), ('g', 4), ('a', 5)]
>>> from itertools import cycle >>> list(zip('beluga', cycle([1, 2, 3]))) [('b', 1), ('e', 2), ('l', 3), ('u', 1), ('g', 2), ('a', 3)] >>> from itertools import repeat >>> list(zip('beluga', repeat([1, 2, 3]))) [('b', [1, 2, 3]), ('e', [1, 2, 3]), ('l', [1, 2, 3]), ('u', [1, 2, 3]), ('g', [1, 2, 3]), ('a', [1, 2, 3])]
Finite iterators are exhausted when their inputs are used up. There are about a dozen of these. Here are a few examples to whet your appetite:
This one has the coolest name. It takes a function and an iterable and applies the function to the elements. The number of members of each element should correspond to the number of arguments to the function.
from math import sqrt from itertools import starmap discriminants = [x for x in starmap(lambda a, b, c: sqrt(b**2 - 4*a*c), [(1, -2 , 1), (1, 4, 4)])] print(discriminants) # [0.0, 0.0]
Chain allows multiple iterators to be treated as a single sequence.
from itertools import chain for c in chain('separate', 'words'): print(c) ''' s e p a r a t e w o r d s '''
Accumulate captures the all the intermediate results of applying a function of two arguments succesively to each element of the input interable and the result thus far.
This allows us to capture running totals. You can use user-defined functions, lambda functions or import operators to use efficient implementations of python’s built-in operators with function syntax.
# factorial from itertools import accumulate import operator list(accumulate(range(1, 10), operator.mul)) # [1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
# operator.add is the default function # running total from itertools import accumulate list(accumulate(range(1, 10))) # [1, 3, 6, 10, 15, 21, 28, 36, 45]
The combinatorial iterators are extremely handy when you need to use a group of combinations of items.
>>> from itertools import product, permutations, combinations, combinations_with_replacement
Product produces the same result as a nested for loop.
>>> list(product('abc', 'def')) [('a', 'd'), ('a', 'e'), ('a', 'f'), ('b', 'd'), ('b', 'e'), ('b', 'f'), ('c', 'd'), ('c', 'e'), ('c', 'f')]
Permutations returns all possible unique selections of length n from the input iterable.
>>> list(permutations(['red', 'green', 'blue'], 2)) [('red', 'green'), ('red', 'blue'), ('green', 'red'), ('green', 'blue'), ('blue', 'red'), ('blue', 'green')]
Combinations returns all possible unique selections of length n from the input iterable ignoring order (i.e only one of
[('red', green), ('green', 'red')]).
>>> list(combinations(['red', 'green', 'blue'], 2)) [('red', 'green'), ('red', 'blue'), ('green', 'blue')]
Combinations returns all possible unique selections of length n from the input iterable ignoring order but allowing multiple choices of the same selection..
>>> list(combinations_with_replacement(['red', 'green', 'blue'], 2)) [('red', 'red'), ('red', 'green'), ('red', 'blue'), ('green', 'green'), ('green', 'blue'), ('blue', 'blue')]
The documentation for the itertools ends with a group of recipes that use itertools functions along with standard python to produce a wide range of iteration patterns. When faced with an iteration challenge, it’s a great idea to check if there’s one applicable to the problem at hand.
In addition, there is another module,
more_itertools that implements the recipes in the itertools documentation and many more useful patterns. We end with a few examples that should provide motivation to explore this wonderful module.
>>> from more_itertools import flatten, pairwise, grouper
Flatten removes one level of nesting from a list of lists
>>> list(flatten([['a', 'b'], [1, 2]])) ['a', 'b', 1, 2]
This handy function returns all successive pairs of elements.
>>> list(pairwise(['red', 'orange', 'green', 'blue'])) [('red', 'orange'), ('orange', 'green'), ('green', 'blue')]
This function breaks the input into chunks of the size argument.
>>> list(grouper(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'], 3)) [('red', 'orange', 'yellow'), ('green', 'blue', 'indigo'), ('violet', None, None)]