Iterables and iterators are everywhere in Python. We usually aren’t aware of the iterators because the syntax of python hides them from us. Almost every time we manipulate a sequence type (strings, lists, tuples, sets, arrays, etc.), we’re using an iterator behind the scenes.
An iterable represents a sequence of values each of which is returned one at a time when the corresponding iterator is invoked.
>>> import sys; sys.version '3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]'
ThisĀ articleĀ explainsĀ theĀ iteratorĀ protocolĀ toĀ deepenĀ understandingĀ ofĀ theĀ basicsĀ andĀ presentsĀ someĀ ofĀ theĀ mostĀ usefulĀ toolsĀ inĀ theĀ itertools
Ā module whichĀ canĀ beĀ helpfulĀ whenĀ theĀ basicsĀ aren’tĀ enoughĀ toĀ getĀ theĀ jobĀ done.Ā Ā Also,Ā we’llĀ examineĀ whyĀ iteratorsĀ canĀ beĀ muchĀ moreĀ efficientĀ thanĀ standard containers.
What are iterables and iterators?
The list [1, 2, 3]
is an iterable. We can get its elements one at a time using the for-in construct.
l = list([1, 2, 3]) for i in l: print(i)
Output:
1 2 3
Now let’s expose what’s going on inside. First, let’s look at the methods that l provides (the dir function lists the methods of an object).
>>> dir(l) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
The significant method for our current purposes is __iter__
. This is what makes l an interable. The __iter__
returns an iterator. Let’s get our hands on an iterator and explore it.
>>> l.__iter__() <list_iterator at 0x2b0659d6088> >>> type(l.__iter__()) list_iterator
Another way to get at the iterator for an iterable is the iter
function. As you can see, it is just a more succinct way to retrieve the iterator.
>>> my_iterator = iter(l); my_iterator <list_iterator at 0x2b0659dc688> >>> my_iterator = iter(l); my_iterator <list_iterator at 0x2b0659dcac8>
Note: there’s a subtlety here: each time __iter__
or iter
is called, a new iterator instance is returned. Each can be called separately. Each one of these is independent and operating with one has no effect on the other(s). This is important for concurrency when multiple processes need to independently operate on the iterable. For now, we can put this aside and look at what we can do with the iterator object.
>>> dir(my_iterator) ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__length_hint__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__']
The significant method for our current purposes is __next__
. This is what makes the object an iterator. The __next__
method returns the next value from the iterable when called.
>>> my_iterator.__next__() 1 >>> my_iterator.__next__() 2
The builtin function next()
does the same thing as calling the __next__
method (similar to iter
and .__iter__
).
>>> next(my_iterator) 3
Now it’s interesting to see what happens when next()
is called again. At this point, we’ve reached the end of the values.
>>> next(my_iterator) --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-13-26be35a80dc3> in <module> ----> 1 next(my_iterator) StopIteration:
As you can see, the iterator raises the StopIteration
exception (and will continue to do so if called again). This signals that there are no more values left (we say that the iterator is exhausted).
And now you can see what for-in
does behind the scenes. The actual code does something like the following:
done = False it = iter(l) while not done: try: print(next(it)) except StopIteration: done = True
Output:
1 2 3
Building iterators
Now let’s build our own iterator that does something a little different to demonstrate how to build your own and also see how the pieces above come together.
This one takes an iterable and a step size, n (and optional offset), and will return every nth element.
class nth_elems(): def __init__(self, contents, stride, start=0): self.contents = contents self.stride = stride self.start = start self.pointer = self.start def __iter__(self): return self def __next__(self): if self.pointer < len(self.contents): value = self.contents[self.pointer] self.pointer += self.stride return value else: raise StopIteration thing = nth_elems(range(10), 3) print(thing) # <__main__.nth_elems at 0x2b0659e5088> print(type(thing)) # __main__.nth_elems print(dir(thing)) ''' ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'contents', 'pointer', 'start', 'stride'] ''' for t in thing: print(t) ''' 0 3 6 9 '''
What’s going on here? We wrap an interable (range(10)
in the iterator we’ve just built and the for loop takes care of driving our iterator (with next()
) and takes care of catching the StopIteration when we’ve reached the end).
You can argue that the same thing could be done by the for loop and you’d be correct but the start argument adds a functionality that isn’t available in the for loop.
thing = nth_elems(range(10), 3, start=2) for t in thing: print(t) ''' 2 5 8 '''
So iterators can be used to build custom behaviors which can be better suited to the problem at hand. And in the usual fashion, python provides a module that adds functionality to the base language and allows you to reuse useful iteration patterns without having to build them yourself.
Interlude: Why iterators
Iterators and lists or tuples both represent collections of items that can be accessed one at a time and can be consumed or processed with for loops and similar constructs. Why use iterators at all?
The reason is simple: lists consume memory for every item in the list. An iterator can retrieve or construct each item as needed and because of that only requires enough memory to store one item.
Let’s look at an example so we can see exactly what that can mean.
>>> import sys >>> sys.getsizeof(list(range(1000000))) 9000112 >>> sys.getsizeof(range(1000000)) 48
So if you are accessing a data structure one element at a time, it can pay huge dividends in both memory and performance to implement an iterator for the object.
The itertools module
The itertools module is a collection of useful iteration patterns and includes 3 basic kinds of iterators: infinite iterators, finite iterators and combinatoric iterators. We give examples of each type below.
infinite iterators
The infinite iterators will keep yielding values until you stop calling for them. They’re great for marking other iterables in some useful way.
>>> from itertools import count >>> count() count(0) >>> list(zip('beluga', count())) [('b', 0), ('e', 1), ('l', 2), ('u', 3), ('g', 4), ('a', 5)]
>>> from itertools import cycle >>> list(zip('beluga', cycle([1, 2, 3]))) [('b', 1), ('e', 2), ('l', 3), ('u', 1), ('g', 2), ('a', 3)] >>> from itertools import repeat >>> list(zip('beluga', repeat([1, 2, 3]))) [('b', [1, 2, 3]), ('e', [1, 2, 3]), ('l', [1, 2, 3]), ('u', [1, 2, 3]), ('g', [1, 2, 3]), ('a', [1, 2, 3])]
Finite iterators
Finite iterators are exhausted when their inputs are used up. There are about a dozen of these. Here are a few examples to whet your appetite:
Starmap
This one has the coolest name. It takes a function and an iterable and applies the function to the elements. The number of members of each element should correspond to the number of arguments to the function.
from math import sqrt from itertools import starmap discriminants = [x for x in starmap(lambda a, b, c: sqrt(b**2 - 4*a*c), [(1, -2 , 1), (1, 4, 4)])] print(discriminants) # [0.0, 0.0]
Chain
Chain allows multiple iterators to be treated as a single sequence.
from itertools import chain for c in chain('separate', 'words'): print(c) ''' s e p a r a t e w o r d s '''
Accumulate
Accumulate captures the all the intermediate results of applying a function of two arguments succesively to each element of the input interable and the result thus far.
This allows us to capture running totals. You can use user-defined functions, lambda functions or import operators to use efficient implementations of python’s built-in operators with function syntax.
# factorial from itertools import accumulate import operator list(accumulate(range(1, 10), operator.mul)) # [1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
# operator.add is the default function # running total from itertools import accumulate list(accumulate(range(1, 10))) # [1, 3, 6, 10, 15, 21, 28, 36, 45]
Combinatorial iterators
The combinatorial iterators are extremely handy when you need to use a group of combinations of items.
>>> from itertools import product, permutations, combinations, combinations_with_replacement
Product
Product produces the same result as a nested for loop.
>>> list(product('abc', 'def')) [('a', 'd'), ('a', 'e'), ('a', 'f'), ('b', 'd'), ('b', 'e'), ('b', 'f'), ('c', 'd'), ('c', 'e'), ('c', 'f')]
Permutations
Permutations returns all possible unique selections of length n from the input iterable.
>>> list(permutations(['red', 'green', 'blue'], 2)) [('red', 'green'), ('red', 'blue'), ('green', 'red'), ('green', 'blue'), ('blue', 'red'), ('blue', 'green')]
Combinations
Combinations returns all possible unique selections of length n from the input iterable ignoring order (i.e only one of [('red', green), ('green', 'red')]
).
>>> list(combinations(['red', 'green', 'blue'], 2)) [('red', 'green'), ('red', 'blue'), ('green', 'blue')]
Combinations
Combinations returns all possible unique selections of length n from the input iterable ignoring order but allowing multiple choices of the same selection..
>>> list(combinations_with_replacement(['red', 'green', 'blue'], 2)) [('red', 'red'), ('red', 'green'), ('red', 'blue'), ('green', 'green'), ('green', 'blue'), ('blue', 'blue')]
Closing remarks
The documentation for the itertools ends with a group of recipes that use itertools functions along with standard python to produce a wide range of iteration patterns. When faced with an iteration challenge, it’s a great idea to check if there’s one applicable to the problem at hand.
In addition, there is another module, more_itertools
that implements the recipes in the itertools documentation and many more useful patterns. We end with a few examples that should provide motivation to explore this wonderful module.
>>> from more_itertools import flatten, pairwise, grouper
Flatten
Flatten removes one level of nesting from a list of lists
>>> list(flatten([['a', 'b'], [1, 2]])) ['a', 'b', 1, 2]
Pairwise
This handy function returns all successive pairs of elements.
>>> list(pairwise(['red', 'orange', 'green', 'blue'])) [('red', 'orange'), ('orange', 'green'), ('green', 'blue')]
Grouper
This function breaks the input into chunks of the size argument.
>>> list(grouper(['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet'], 3)) [('red', 'orange', 'yellow'), ('green', 'blue', 'indigo'), ('violet', None, None)]