In this article, you’ll learn how to divide a list into equally-sized chunks in Python. Step-by-step, you’ll arrive at the following code to chunk your list into evenly-sized parts:
Go ahead and play around with the code yourself but if you need some explanations, read on because I’ll explain it to you in much detail:
Problem Formulation
Problem: Imagine that you have a temperature sensor that sends data every 6 minutes, which makes 10 data points per hour. All these data points are stored in one list for each day.
Now, we want to have a list of hourly average temperatures for each day—this is why we need to split the list of data for one day into evenly sized chunks.
Chunking Your List
To chunk your list into consecutive parts of size n
, use a for-loop to iterate over every n
-th index using Python’s built-in function range(0, len(data), chunk_length)
. Then, use index i
as a starting position to get the same number of consecutive elements from each starting position using Python’s slicing feature data[i:i+chunk_length]
.
Here’s the code:
data = [15.7, 16.2, 16.5, 15.9, ..., 27.3, 26.4, 26.1, 27.2] chunk_length = 10 for i in range(0, len(data), chunk_length): print(data[i:i+chunk_length])
Background range() Function
The range()
function can be used either with one, two or three arguments.
- If you use it with one single argument, e.g.,
range(10)
, we get a range object containing the numbers 0 to 9. So, if you call range with one argument, this argument will be interpreted as the stop value of the range, but it is excluded from the range. - You can also call the
range()
function with two arguments, e.g.,range(5, 10)
. This call with two arguments returns a range object containing the numbers 5 to 9. So, now we have a lower and an upper bound for the range. Contrary to the stop value, the start value is included in the range. - In a call of the function
range()
with three arguments, the first parameter is the start value, the second one is the stop value and the third value is the step size. For example,range(5, 15, 2)
returns a range object containing the following values: 5, 7, 9, 11, 13. As you can see, the range starts with the start and then it adds the step value as long as the values are less than the stop value.
In our problem, our chunks have a length of 10, the start value is 0 and the max value is the end of the list of data.
So, if you call range(0, len(data), 10)
, it’ll iterate over the chunks’ start indices. Let’s put some numbers there to exemplify this:
For one single day, we have a data length of 24 * 10 = 240, so the call of the range function would be this: range(0, 240, 10)
and the resulting range would be 0, 10, 20, 30, β¦, 230. Pause a moment and consider these values: they represent the indices of the first element of each chunk.
So what do we have now? The start indices of each chunk and also the length – and that’s all we need to slice the input data into the chunks we need.
Background Slicing
The slicing operator takes two or three arguments separated by the colon :
symbol. They have the same meaning as in the range function.
Slicing is a concept to carve out a substring from a given string. Use slicing notation s[start:stop:step]
to access every step
-th element starting from index start
(included) and ending in index stop
(excluded). All three arguments are optional, so you can skip them to use the default values (start=0
, stop=len(lst)
, step=1
). For example, the expression s[2:4]
from string 'hello'
carves out the slice 'll'
and the expression s[:3:2]
carves out the slice 'hl'
.
If you want to know more about slicing read our detailed article here. Feel free to also watch our background video on Python Slicing
However, we can still improve this code and make it reusable by creating a generator out of it.
Chunking With Generator Expressions
A generator is a function but instead of a return statement it uses the keyword yield
.
The keyword yield
interrupts the function and returns a value. The next time the function gets called, the next value is returned and the function’s execution stops again. This behavior can be used in a for-loop, where we want to get a value from the generator, work with this value inside the loop and then repeat it with the next value. Now, let’s take a look at the improved version of our code:
data = [15.7, 16.2, 16.5, 15.9, ..., 27.3, 26.4, 26.1, 27.2] chunk_length = 10 def make_chunks(data, length): for i in range(0, len(data), length): yield data[i:i+length] for chunk in make_chunks(data, chunk_length): print(chunk)
That looks already pretty pythonic and we can reuse the function make_chunks()
for all the other data we need to process.
Example Averaging over Chunks
Let’s finish the code so that we get a list of hourly average temperatures as result.
import random def make_chunks(data, length): for i in range(0, len(data), length): yield data[i:i + length] def process(chunk): return round(sum(chunk)/len(chunk), 2) n = 10 # generate random temperature values day_temperatures = [random.random() * 20 for x in range(24 * n)] avg_per_hour = [] for chunk in make_chunks(day_temperatures, n): r = process(chunk) avg_per_hour.append(r) print(avg_per_hour)
And that’s it, this cool pythonic code solves our problem. We can make the code even a bit shorter but I consider this code less readable because you need to know really advanced Python concepts.
import random make_chunks = lambda data, n: (data[i:i + n] for i in range(0, len(data), n)) process = lambda data: round(sum(data)/len(data), 2) n = 10 # generate random temperature values day_temperatures = [random.random() * 20 for x in range(24 * n)] avg_per_hour = [] for chunk in make_chunks(day_temperatures, n): r = process(chunk) avg_per_hour.append(r) print(avg_per_hour)
So, what did we do? We reduced the helper functions to lambda expressions and for the generator function we use a special shorthand – the parenthesis.
Summary
We used the range function with three arguments, the start value, the stop value, and the step value. By setting the step value to our desired chunk length, the start value to 0, and the stop value to the total data length, we get a range object containing all the start indices of our chunks. With the help of slicing we can access exactly the chunk we need in each iteration step.
Where to Go From Here?
Want to start earning a full-time income with Python—while working only part-time hours? Then join our free Python Freelancer Webinar.
It shows you exactly how you can grow your business and Python skills to a point where you can work comfortable for 3-4 hours from home and enjoy the rest of the day (=20 hours) spending time with the persons you love doing things you enjoy to do.