[NumPy Tutorial] How to Calculate Post-Tax Income in One Line?

This article will not only show you how to solve a day-to-day accounting task – which would usually take many lines of Python code – in a single line of code. But it will also introduce you to some elementary functionalities of Python’s wildly important library for numerical computations and data science: NumPy.

The Basics

At the heart of the NumPy library are NumPy arrays (in short: arrays). The NumPy array holds all your data to be manipulated, analyzed, and visualized. And even higher-level data science libraries like Pandas use NumPy arrays implicitly or explicitly for their data analysis. You can think of a NumPy array as a Python list which can be nested, and which has some special properties and restrictions. For instance, an array consists of one or more axes (think of it as “dimensions”).

Here is an example for one-dimensional, two-dimensional, and three-dimensional NumPy arrays:

import numpy as np


# 1D array
a = np.array([1, 2, 3])
print(a)
"""
[1 2 3]
"""


# 2D array
b = np.array([[1, 2],
              [3, 4]])
print(b)
"""
[[1 2]
 [3 4]]
"""


# 3D array
c = np.array([[[1, 2], [3, 4]],
              [[5, 6], [7, 8]]])
print(c)
"""
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
"""

Creating a NumPy array is as simple as passing a normal Python list as an argument into the function np.array(). You can see that a one-dimensional array corresponds to a simple list of numerical values. A two-dimensional array corresponds to a nested list of lists of numerical values. Finally, a three-dimensional array corresponds to a nested list of lists of lists of numerical values. You can easily create higher-dimensional arrays with the same procedure.

As a rule of thumb: the number of opening brackets gives you the dimensionality of the NumPy array.

One of the advantages of NumPy arrays are that they have overloaded the basic arithmetic operators ‘+’, ‘-‘, ‘*’, and ‘/’. Semantically, think of these as “element-wise operations”. For example, see how the following two-dimensional array operations perform:

import numpy as np


a = np.array([[1, 0, 0],
              [1, 1, 1],
              [2, 0, 0]])

b = np.array([[1, 1, 1],
              [1, 1, 2],
              [1, 1, 2]])


print(a + b)
"""
[[2 1 1]
 [2 2 3]
 [3 1 2]]
"""

print(a - b)
"""
[[ 0 -1 -1]
 [ 0  0 -1]
 [ 1 -1 -2]]
"""

print(a * b)
"""
[[1 0 0]
 [1 1 2]
 [2 0 0]]
"""

print(a / b)
"""
[[1.  0.  0. ]
 [1.  1.  0.5]
 [2.  0.  0. ]]
"""

If you look closely, you’ll find that each operation combines two NumPy arrays element-wise. For example, addition of two arrays results in a new array where each new value is the sum of the corresponding value of the first and the second array.

But NumPy provides a lot more capabilities for manipulating arrays. For example, the np.max() function calculates the maximal value of all values in a NumPy array. The np.min() function calculates the minimal value of all values in a NumPy array. And the np.average() function calculates the average value of all values in a NumPy array.

Here is an example of those three operations:

import numpy as np


a = np.array([[1, 0, 0],
              [1, 1, 1],
              [2, 0, 0]])

print(np.max(a))
# 2

print(np.min(a))
# 0

print(np.average(a))
# 0.6666666666666666

The maximal value of all values in the NumPy array is 2, the minimal value is 0, and the average is (1+0+0+1+1+1+2+0+0)/9=2/3. Again, NumPy is much more powerful than that – but this is already enough to solve the following problem: “How to find the maximal after-tax income of a number of people, given their yearly salary and tax rates?”

The Code

Let’s have a look at this problem. Given is the salary data of Alice, Bob, and Tim. It seems like Bob has enjoyed the highest salary in the last three years. But is this really the case considering the individual tax rates of our three friends?

## Dependencies
import numpy as np


## Data: yearly salary in ($1000) [2017, 2018, 2019]
alice = [99, 101, 103]
bob = [110, 108, 105]
tim = [90, 88, 85]

salaries = np.array([alice, bob, tim])
taxation = np.array([[0.2, 0.25, 0.22],
                     [0.4, 0.5, 0.5],
                     [0.1, 0.2, 0.1]])


## One-liner
max_income = np.max(salaries - salaries * taxation)

               
## Result
print(max_income)

Take a guess: what’s the output of this code snippet?

The Result

In the code snippet, the first statements import the NumPy library into the namespace using the de-facto standard name for the NumPy library: np. The following few statements create the data consisting of a two-dimensional NumPy array with three rows (one row for each person Alice, Bob, and Tim) and three columns (one column for each year 2017, 2018, and 2019). I created two matrices: salaries and taxation. The former holds the yearly incomes, while the latter holds the taxation rates for each person and year.

To calculate the after-tax income, you need to deduct the tax (as a Dollar amount) from the gross income stored in the array ‘salaries’. We use the overloaded NumPy operators ‘-‘ and ‘*’ to achieve exactly this. Again, both operators perform element-wise computations on the NumPy arrays. As a side-note, the element-wise multiplication of two matrices is called “Hadamard product”.

Let’s examine how the NumPy array looks like after deducing the taxes from the gross incomes:

print(salaries - salaries * taxation)
"""
[[79.2  75.75 80.34]
 [66.   54.   52.5 ]
 [81.   70.4  76.5 ]]
"""

You can see that the large income of Bob (see the second row of the NumPy array) vanishes after paying 40% and 50% of taxes.

In the one-liner, we print the maximal value of this resulting array. Per default, the np.max() function simply finds the maximal value of all values in the array. Thus, the maximal value is Tim’s $90,000 income in 2017 which is taxed only by 10% – the result of the one-liner is “81.”.

Where to Go From here?

Don’t miss out on the data science and machine learning train. To help you grow your skills from basic Python level to NumPy expertise, I have written a new NumPy book “Coffee Break NumPy“. It uses proven principles of good teaching such as puzzle-based learning, cheat sheets, and simple tutorials. Check it out!

“Coffee Break NumPy: A Simple Road to Data Science Mastery That Fits Into Your Busy Life”

Leave a Comment

Your email address will not be published. Required fields are marked *