5 Best Ways to Normalize a NumPy Array Between 0 and 1

πŸ’‘ Problem Formulation: Normalizing data can be essential for machine learning and other statistical techniques to ensure all input features have the same scale. This task becomes crucial when we work with NumPy arrays in Python. Given an input array, like np.array([2, 4, 6, 8, 10]), normalization rescales the elements to fall within the range [0, 1]. The desired output for this array after normalization would be an array where the smallest value (2) becomes 0, the largest value (10) becomes 1, and all other values are proportionally adjusted between 0 and 1.

Method 1: Min-Max Scaling

The min-max scaling method involves subtracting the minimum value of the array and then dividing by the range of the array. By definition, min-max normalization is the process of converting an actual range of values which can lie between any range of values into a standard range of values typically in the scale of 0 to 1.

Here’s an example:

import numpy as np

def min_max_scale(array):
    return (array - array.min()) / (array.max() - array.min())

# Example array
arr = np.array([2, 4, 6, 8, 10])
normalized_arr = min_max_scale(arr)
print(normalized_arr)

Output:

[0.   0.25 0.5  0.75 1.  ]

This piece of code defines a function min_max_scale that takes a NumPy array as input. It subtracts the minimum value of the array from every element and then divides each by the range, which is the value of the maximum element minus the minimum element. The result is a new array where the smallest element is scaled to 0, the largest to 1, and all other elements to a decimal between 0 and 1.

Method 2: Using the NumPy interpolate Method

NumPy does not have a built-in interpolate method specifically for normalization, but you can simulate it by using the linear space and interpolation functionalities it provides.

Here’s an example:

import numpy as np

arr = np.array([2, 4, 6, 8, 10])
normalized_arr = np.interp(arr, (arr.min(), arr.max()), (0, 1))
print(normalized_arr)

Output:

[0.   0.25 0.5  0.75 1.  ]

This snippet uses NumPy’s interp function to achieve normalization of an array. The interp function takes the array to normalize, a tuple of the original data range, and a tuple defining the range over which to normalize, which in this case is (0, 1). The normalized array has its elements linearly rescaled within the new range.

Method 3: Division by L1 Norm

Normalizing an array by its L1 norm scales the array so that the sum of the absolute values of its components is 1. This form of normalization is not strictly scaling between 0 and 1 but provides a proportional scaling that maintains the array’s spatial representation.

Here’s an example:

import numpy as np

arr = np.array([2, 4, 6, 8, 10])
l1_norm = np.sum(np.abs(arr))
normalized_arr = arr / l1_norm
print(normalized_arr)

Output:

[0.06666667 0.13333333 0.2        0.26666667 0.33333333]

Here we calculate the L1 norm (or Manhattan norm) which is the sum of the absolute values of the array. Each element of the array is then divided by this L1 norm, resulting in a normalized array.

Method 4: Division by L2 Norm

Normalizing an array by its L2 norm (also known as the Euclidean norm) scales the array elements so that the square root of the sum of the squares is 1. This type of normalization is popular in vector space models in machine learning and maintains the direction of the original array.

Here’s an example:

import numpy as np

arr = np.array([2, 4, 6, 8, 10])
l2_norm = np.sqrt(np.sum(np.square(arr)))
normalized_arr = arr / l2_norm
print(normalized_arr)

Output:

[0.10910895 0.21821789 0.32732684 0.43643578 0.54554473]

In this code, the L2 norm is computed by summing the squares of the array elements, taking the square root of this sum, and then dividing each element by the result. The normalized array maintains its original direction but the norm of the vector becomes 1.

Bonus One-Liner Method 5: Using NumPy’s ptp Function

Utilizing NumPy’s ptp function lets you compact normalizing into a one-liner. ptp stands for “peak-to-peak”, and it returns the range of values along an axis, which can be used to normalize the array.

Here’s an example:

import numpy as np

arr = np.array([2, 4, 6, 8, 10])
normalized_arr = (arr - arr.min()) / np.ptp(arr)
print(normalized_arr)

Output:

[0.   0.25 0.5  0.75 1.  ]

This snippet shows a concise way to normalize an array. The ptp function finds the range (max – min) of the array, which is used in the denominator to scale the elements after the minimum value has been subtracted from them.

Summary/Discussion

  • Method 1: Min-Max Scaling. Simple concept and widely used. Might not handle outliers well.
  • Method 2: NumPy interpolate Method. Very concise. Less transparent than method 1 for those unfamiliar with interpolation functions.
  • Method 3: Division by L1 Norm. Useful when the sum of values is relevant. Does not scale into 0-1 range explicitly.
  • Method 4: Division by L2 Norm. Preserves vector direction. Does not scale into 0-1 range explicitly. Assumes importance of Euclidean norm.
  • Method 5: Using NumPy’s ptp Function. Extremely concise. Essentially equivalent to Min-Max Scaling but uses a built-in function for the range of values.