This one-liner demonstrates the power of three interesting NumPy features and how their combination can solve a small data science problem in a clean and efficient manner.
Every NumPy array has a certain data type. Keep this in mind, when exploring the following one-liner code. Although you need to know three different pieces of information to understand the one-liner, you can probably already guess what’s happening there. But before you read on, think about how you would address the following problem: say you have a NumPy array with many rows. You want to change the values of one specific row. But not all of them, only every other value. How would you accomplish this?
The specific problem we address in the following one-liner is: How to increase the salaries of data scientists by 10% but only in every other year– in a single line of code?
## Dependencies import numpy as np ## Data: salary in ($1000) [2017, 2018, 2019] dataScientist = [130, 132, 137] productManager = [127, 140, 145] designer = [118, 118, 127] softwareEngineer = [129, 131, 137] employees = np.array([dataScientist, productManager, designer, softwareEngineer]) ## One-liner employees[0,::2] = employees[0,::2] * 1.1 ## Result print(employees)
Take a minute and think about the output of this code snippet. What would you expect to change? What’s the data type of the resulting array?
Say, you are working at a company and the accountant asks you to analyze salary data of different employees in your company. You create a NumPy array that holds the relevant data: Each row gives the yearly salary of one professional (data scientist, product manager, designer, or software engineer). Each column gives the respective years (2017, 2018, 2019). Hence, the resulting NumPy array has four rows and three columns.
The accountant tells you that there is some money left and management wants to reinforce the most important professionals in the company. You believe in the strong future of data science, so you convince the accountant to give more money to the hidden heroes of your company: the data scientists. What’s the best way of updating the NumPy array so that only the data scientists’ salaries increase by 10% (but only every other year starting from the first year in your database)?
After a while, you develop the following beautiful one-liner:
employees[0,::2] = employees[0,::2] * 1.1
It looks simple and clean but there are three interesting concepts at play which only an advanced NumPy expert will know. You’ll learn about them in a moment.
But first, let’s have a look at the result:
[[143 132 150] [127 140 145] [118 118 127] [129 131 137]]
Did you expect the resulting NumPy array look like this one?
What are the three interesting concepts?
First, the line uses both concepts of slicing and slice assignments in NumPy. In the example, we use slicing to get every other value of the first row from the NumPy array employees. Then, we perform some modifications and update every other value of the first row using slice assignment. This procedure replaces the content of the first row in the NumPy array with the updated salary data.
Second, although you may not have realized it, we used a powerful concept called “broadcasting” in NumPy. Broadcasting means that NumPy automatically fixes element-wise operations of NumPy arrays with different shapes. For example, the multiplication operator * usually performs element-wise multiplication when applied to one- or multi-dimensional NumPy arrays. But in the one-liner, the left operator is a NumPy array while the right operator is a float value. Of course, NumPy could simply through an error in this case and let the developer fix it—but the creators of the library decided to implement the intuition of the programmer. NumPy automatically creates a NumPy array with the same size and dimensionality and fills this array, conceptually, with copies of the float value. But in reality, NumPy performs a computation that looks more like the following:
np.array([143 150]) * np.array([1.1, 1.1])
Third, you may have realized that the resulting data type is not float but integer – even if we are performing floating point arithmetic. The reason is simple if you know it: every NumPy array has an associated data type (which you can access using the dtype property). When creating the array, NumPy realized that the array contains only integer values. So the array is assumed to be an integer array. Thus, any operation you perform on the integer array won’t change the data type. NumPy rounds to integer values. You can see this in the following example:
print(employees.dtype) # int32 employees[0,::2] = employees[0,::2] * 1.1 print(employees.dtype) # int32
Where to go from here?
To help you grow your skills from basic Python level to NumPy expertise, I have written a new NumPy book “Coffee Break NumPy“. It uses proven principles of good teaching such as puzzle-based learning, cheat sheets, and simple tutorials.
Don’t miss out on the data science and machine learning train. Check it out!