5 Best Ways to Find the Maximum in an Array, Ignoring NaNs in Python

💡 Problem Formulation: When working with numerical arrays in Python, it’s common to seek the highest value. However, the presence of NaN (Not a Number) values can complicate this task, leading to an incorrect maximum value or an error. This article discusses several methods to return the maximum value of an array along an axis or the overall maximum while safely ignoring any NaNs. For instance, given an input array like [1, 2, NaN, 4], we aim to find the maximum value, which is 4, disregarding the NaN.

Method 1: Using numpy’s nanmax

The numpy.nanmax() function is specifically designed to return the maximum of an array or along an axis, ignoring any NaNs present. It’s the go-to function for this problem when using the NumPy library.

Here’s an example:

import numpy as np

arr = np.array([1, 2, np.nan, 4])
max_val = np.nanmax(arr)

print(max_val)

Output:

4.0

This code snippet creates a NumPy array containing NaN and utilizes the np.nanmax() function to compute the maximum value while skipping the NaN value. The function returns 4.0, correctly identifying the maximum numerical value in the array.

Method 2: Using numpy’s amax with a Mask

Another NumPy-based method involves using numpy.amax() in conjunction with a boolean mask that filters out NaNs. This is a two-step process but provides more control over the operation.

Here’s an example:

import numpy as np

arr = np.array([1, 2, np.nan, 4])
mask = ~np.isnan(arr)
max_val = np.amax(arr[mask])

print(max_val)

Output:

4.0

In this code snippet, we construct a boolean mask to identify non-NaN elements and then apply this mask to the original array. The np.amax() function is then used to find the maximum value of the masked array, effectively ignoring NaNs.

Method 3: Using pandas’ max

The pandas.Series.max() function automatically skips NaN values by default when computing the maximum, which can be quite handy when working with Pandas data structures.

Here’s an example:

import pandas as pd

ser = pd.Series([1, 2, np.nan, 4])
max_val = ser.max()

print(max_val)

Output:

4.0

This example demonstrates the use of Pandas Series, where the .max() method conveniently ignores NaN values. The function returns the maximum numerical value 4.0 from the Pandas Series.

Method 4: Custom Function with Filter

A custom solution can be written to filter out NaNs using a comprehension list or filter function before applying the built-in max() function in Python.

Here’s an example:

import math

arr = [1, 2, float('nan'), 4]
max_val = max(x for x in arr if not math.isnan(x))

print(max_val)

Output:

4

The code shows how to implement a custom filtering mechanism using a generator expression to exclude NaN values before calculating the maximum. The math.isnan() function is used to check for NaNs and the built-in max() function computes the maximum of the filtered values.

Bonus One-Liner Method 5: Using Max with a Conditional Expression

For a concise one-liner, you can use the built-in max() function with a conditional expression to ignore NaNs.

Here’s an example:

arr = [1, 2, float('nan'), 4]
max_val = max(x for x in arr if x == x)

print(max_val)

Output:

4

This one-liner leverages the fact that NaN is not equal to itself. Using a simple conditional expression x == x, we filter out NaN values directly inside the call to max(), which then computes the maximum value correctly.

Summary/Discussion

Method 1: numpy.nanmax. Straightforward, efficient. Requires NumPy. Not suitable for lists.
Method 2: numpy.amax with a Mask. Flexible, allows for additional operations. More verbose, requires NumPy.
Method 3: pandas.max. Integrates easily with Pandas Series/DataFrames. Not useful for regular lists or arrays without Pandas.
Method 4: Custom Function with Filter. No extra dependencies, works on lists. More code needed, potentially less efficient.
Method 5: Max with Conditional Expression. Quick one-liner, no dependencies. Can be less readable, not straightforward for beginners.