Real-world data is seldomly clean: It may contain errors because of faulty sensor, or it may contain missing data because of damaged sensors. In this one-liner section, you learn about how to quickly handle smaller cleaning tasks in a single line of code.

## The Basics

Say, you have installed a temperature sensor in your garden to measure temperature data over a period of many weeks. Every Sunday, you uninstall the temperature sensor from the garden and take it in your house to digitize the sensor values. Now, you realize that the Sunday sensor values are faulty because they partially measured the temperature at your home and not at the outside location.

In this mini code project, you want to “clean” your data by replacing every Sunday sensor value with the average sensor value of the last seven days. But before we dive into the code, let’s explore the most important concepts you need as a basic understanding.

In the previous chapters, you have already learned about slicing and slice assignments in Python. NumPy’s slice assignment feature is similar: you specify the values to be replaced on the left-hand side of the equation and the values that replace them on the right-hand side of the equation. Here is an example:

import numpy as np a = np.array([4] * 16) print(a) # [4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4] a[1::] = [16] * 15 print(a) # [ 4 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16]

The code snippet creates an array containing 16 times the value 4. Then we use slice assignment to replace the 15 trailing sequence values with the value 16. Recall that the notation a[start:stop:step] selects the sequence starting at index “start”, ending in index “stop” (exclusive), and considering only every “step”-th sequence element. Thus, the notation a[1::] replaces all sequence elements but the first one.

import numpy as np a = np.array([4] * 16) a[1:8:2] = 16 print(a) # [ 4 16 4 16 4 16 4 16 4 4 4 4 4 4 4 4]

This example shows how to use slice assignment with all parameters specified. An interesting twist is that we specify only a single value “16” to replace the selected elements. Do you already know the name of this feature? Correct, broadcasting is the name of the game! The right-hand side of the equation is automatically transformed into a NumPy array. The shape of this array is equal to the left-hand array.

Before we investigate how to solve the problem with a new one-liner, let me quickly explain the shape property of NumPy arrays. Every array has an associated shape attribute (a tuple). The

import numpy as np a = np.array([1, 2, 3]) print(a) """ [1 2 3] """ print(a.shape) # (3,) b = np.array([[1, 2, 3], [4, 5, 6]]) print(b) """ [[1 2 3] [4 5 6]] """ print(b.shape) # (2, 3) c = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) print(c) """ [[[ 1 2 3] [ 4 5 6]] [[ 7 8 9] [10 11 12]]] """ print(c.shape) # (2, 2, 3)

We create three arrays a, b, and c. Array a is one-dimensional, so the shape tuple has only a single element. Array b is two-dimensional, so the shape tuple has two elements. Finally, array c is three-dimensional, so the shape tuple has three elements.

This is everything you need to know to solve the following problem:

## The Code

Given an array of temperature values, replace every seventh temperature value with the average of the last seven days.

## Dependencies import numpy as np ## Sensor data (M, T, W, T, F, Sa, Su) tmp = np.array([1, 2, 3, 4, 3, 4, 4, 5, 3, 3, 4, 3, 4, 6, 6, 5, 5, 5, 4, 5, 5]) ## One-liner tmp[6::7] = np.average(tmp.reshape((-1,7)), axis=1) ## Result print(tmp)

Take a guess: what’s the output of this code snippet?

## The Result

First, the puzzle creates the data matrix “tmp” with a one-dimensional sequence of sensor values. In every line, we define all seven sensor values for seven days of the week (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday).

Second, we use slice assignment to replace all the Sunday values of this array. As Sunday is the seventh day of the week, the expression “tmp[6::7]” selects the respective Sunday values starting from the seventh element in the sequence (again: the Sunday sensor value).

Third, we reshape the one-dimensional sensor array into a two-dimensional array with seven columns. This makes it easier for us to calculate the weekly average temperature value to replace the Sunday data. Note that the dummy shape tuple value -1 (in “tmp.reshape((-1,7))”) means that the number of rows (axis 0) should be selected automatically. In our case, it results in the following array after reshaping:

print(tmp.reshape((-1,7))) """ [[1 2 3 4 3 4 4] [5 3 3 4 3 4 6] [6 5 5 5 4 5 5]] """

It’s one row per week and one column per weekday.

Now we calculate the 7-day average by collapsing every row into a single average value using the np.average() function with the axis argument: axis=1 means that the second axis is collapsed into a single average value. This is the result of the right-hand side of the equation:

print(np.average(tmp.reshape((-1,7)), axis=1)) # [3. 4. 5.]

After replacing all Sunday sensor values, we get the following final result of the one-liner:

# [1 2 3 4 3 4 3 5 3 3 4 3 4 4 6 5 5 5 4 5 5]

## Where to go from here?

Do you love data science? But you struggle to get everything together and develop a good intuition about the NumPy library?

To help you

Get your “Coffee Break NumPy” now!