Trick #1: Slicing and Slice Assignment
This one-liner demonstrates the power of three interesting NumPy features and how their combination can solve a small data science problem in a clean and efficient manner.
Say, you are working at a company and the accountant asks you to analyze salary data of different employees in your company. You create a NumPy array that holds the relevant data: Each row gives the yearly salary of one professional (data scientist, product manager, designer, or software engineer). Each column gives the respective years (2017, 2018, 2019). Hence, the resulting NumPy array has four rows and three columns.
The accountant tells you that there is some money left and management wants to reinforce the most important professionals in the company. You convince the accountant to give more money to the hidden heroes of your company: the data scientists.
Problem Formulation: What’s the best way of updating the NumPy array so that only the data scientists’ salaries increase by 10% — but only every other year starting from the first year in your database?
import numpy as np ## Data: salary in ($1000) [2017, 2018, 2019] dataScientist = [130, 132, 137] productManager = [127, 140, 145] designer = [118, 118, 127] softwareEngineer = [129, 131, 137] employees = np.array([dataScientist, productManager, designer, softwareEngineer]) employees[0,::2] = employees[0,::2] * 1.1 ## Result print(employees)
Let’s have a look at the result:
[[143 132 150] [127 140 145] [118 118 127] [129 131 137]]
The highlighted line uses both concepts of slicing and slice assignments in NumPy. In the example, we use slicing to get every other value of the first row from the NumPy array employees. Then, we perform some modifications and update every other value of the first row using slice assignment. This procedure replaces the content of the first row in the NumPy array with the updated salary data.
Trick #2: Broadcasting
Second, although you may not have realized it, we used a powerful concept called “broadcasting” in NumPy.
Broadcasting means that NumPy automatically fixes element-wise operations of NumPy arrays with different shapes. For example, the multiplication operator * usually performs element-wise multiplication when applied to one- or multi-dimensional NumPy arrays.
Broadcasting describes how NumPy automatically brings two arrays with different shapes to a compatible shape during arithmetic operations. Generally, the smaller array is “repeated” multiple times until both arrays have the same shape. Broadcasting is memory-efficient as it doesn’t actually copy the smaller array multiple times.
Here’s a minimal example:
import numpy as np A = np.array([1, 2, 3]) res = A * 3 # scalar is broadcasted to [3 3 3] print(res) # [3 6 9]
Read more about this powrful NumPy trick in our detailed guide:
NumPy Broadcasting – A Simple Illustrated Guide
Trick #3: Automatic Type Conversion
In the following code snippet, you’ll realize that the resulting data type is not float but integer – even if we are performing floating point arithmetic.
import numpy as np ## Data: salary in ($1000) [2017, 2018, 2019] dataScientist = [130, 132, 137] productManager = [127, 140, 145] designer = [118, 118, 127] softwareEngineer = [129, 131, 137] employees = np.array([dataScientist, productManager, designer, softwareEngineer]) print(employees.dtype) # int32 employees[0,::2] = employees[0,::2] * 1.1 print(employees.dtype) # int32
The reason is simple if you know it: every NumPy array has an associated data type (which you can access using the dtype
property). When creating the array, NumPy realized that the array contains only integer values. So the array is assumed to be an integer array. Thus, any operation you perform on the integer array won’t change the data type. NumPy rounds to integer values.
Where to Go From Here?
To help you grow your skills from basic Python level to NumPy expertise, I have written a new NumPy book “Coffee Break NumPy“. It uses proven principles of good teaching such as puzzle-based learning, cheat sheets, and simple tutorials.
Don’t miss out on the data science and machine learning train. Check it out!
“Coffee Break NumPy: A Simple Road to Data Science Mastery That Fits Into Your Busy Life”