In this article, I’ll show you how to divide a list into equally-sized chunks in Python. Step-by-step, you’ll arrive at the following great code that accomplishes exactly that:
You can play around with the code yourself but if you need some explanations, read on because I’ll explain it to you in much detail:
Chunking Your List
Let’s make this question more palpable by transforming it into a practical problem:
Problem: Imagine that you have a temperature sensor that sends data every 6 minutes, which makes 10 data points per hour. All these data points are stored in one list for each day.
Now, we want to have a list of hourly average temperatures for each day—this is why we need to split the list of data for one day into evenly sized chunks.
Solution: To achieve this, we use a for-loop and Python’s built-in function
range() which we have to examine in depth.
range() function can be used either with one, two or three arguments.
- If you use it with one single argument, e.g.,
range(10), we get a range object containing the numbers 0 to 9. So, if you call range with one argument, this argument will be interpreted as the max or stop value of the range, but it is excluded from the range.
- You can also call the
range()function with two arguments, e.g.,
range(5, 10). This call with two arguments returns a range object containing the numbers 5 to 9. So, now we have a lower and an upper bound for the range. Contrary to the stop value, the start value is included in the range.
- In a call of the function
range()with three parameters, the first parameter is the start value, the second one is the stop value and the third value is the step size. For example,
range(5, 15, 2)returns a range object containing the following values: 5, 7, 9, 11, 13. As you can see, the range starts with the start and then it adds the step value as long as the values are less than the stop value.
In our problem, our chunks have a length of 10, the start value is 0 and the max value is the end of the list of data.
Putting all together: Calling
range(0, len(data), 10) will give us exactly what we need to iterate over the chunks. Let’s put some numbers there to visualize it.
For one single day, we have a data length of 24 * 10 = 240, so the call of the range function would be this:
range(0, 240, 10) and the resulting range would be 0, 10, 20, 30, …, 230. Pause a moment and consider these values: they represent the indices of the first element of each chunk.
So what do we have now? The start indices of each chunk and also the length – and that’s all we need to slice the input data into the chunks we need.
The slicing operator takes two or three arguments separated by the colon
: symbol. They have the same meaning as in the range function.
A first draft of our code could be this:
data = [15.7, 16.2, 16.5, 15.9, ..., 27.3, 26.4, 26.1, 27.2] chunk_length = 10 for i in range(0, len(data), chunk_length): print(data[i:i+chunk_length])
Play with this code in our interactive Python shell:
However, we can still improve this code and make it reusable by creating a generator out of it.
Chunking With Generator Expressions
A generator is a function but instead of a return statement it uses the keyword
yield interrupts the function and returns a value. The next time the function gets called, the next value is returned and the function’s execution stops again. This behavior can be used in a for-loop, where we want to get a value from the generator, work with this value inside the loop and then repeat it with the next value. Now, let’s take a look at the improved version of our code:
data = [15.7, 16.2, 16.5, 15.9, ..., 27.3, 26.4, 26.1, 27.2] chunk_length = 10 def make_chunks(data, length): for i in range(0, len(data), length): yield data[i:i + length] for chunk in make_chunks(data, chunk_length): print(chunk)
That looks already pretty pythonic and we can reuse the function
make_chunks() for all the other data we need to process.
Let’s finish the code so that we get a list of hourly average temperatures as result.
import random def make_chunks(data, length): for i in range(0, len(data), length): yield data[i:i + length] def process(chunk): return round(sum(chunk)/len(chunk), 2) n = 10 # generate random temperature values day_temperatures = [random.random() * 20 for x in range(24 * n)] avg_per_hour =  for chunk in make_chunks(day_temperatures, n): r = process(batch) avg_per_hour.append(r) print(avg_per_hour)
And that’s it, this cool pythonic code solves our problem. We can make the code even a bit shorter but I consider this code less readable because you need to know really advanced Python concepts.
import random make_chunks = lambda data, n: (data[i:i + n] for i in range(0, len(data), n)) process = lambda data: round(sum(data)/len(data), 2) n = 10 # generate random temperature values day_temperatures = [random.random() * 20 for x in range(24 * n)] avg_per_hour =  for chunk in make_chunks(day_temperatures, n): r = process(batch) avg_per_hour.append(r) print(avg_per_hour)
So, what did we do? We reduced the helper functions to lambda expressions and for the generator function we use a special shorthand – the parenthesis.
To sum up the solution: We used the range function with three arguments, the start value, the stop value and the step value. By setting the step value to our desired chunk length, the start value to 0 and the stop value to the total data length, we get a range object containing all the start indices of our chunks. With the help of slicing we can access exactly the chunk we need in each iteration step.
Where to Go From Here?
Want to start earning a full-time income with Python—while working only part-time hours? Then join our free Python Freelancer Webinar.
It shows you exactly how you can grow your business and Python skills to a point where you can work comfortable for 3-4 hours from home and enjoy the rest of the day (=20 hours) spending time with the persons you love doing things you enjoy to do.