Pandas is Excel on steroids—the powerful Python library allows you to analyze structured and tabular data with surprising efficiency and ease. Pandas is one of the reasons why master coders reach 100x the efficiency of average coders. In today’s article, you’ll learn how to work with missing data—in particular, how to handle NaN values in Pandas DataFrames.
You’ll learn about all the different reasons why NaNs appear in your DataFrames—and how to handle them. Let’s get started!
Checking Series for NaN Values
Problem: How to check a series for NaN values?
Have a look at the following code:
import pandas as pd import numpy as np data = pd.Series([0, np.NaN, 2]) result = data.hasnans print(result) # True
Series can contain NaN
-values—an abbreviation for Not-A-Number—that describe undefined values.
To check if a Series contains one or more NaN
value, use the attribute hasnans
. The attribute returns True
if there is at least one NaN
value and False
otherwise.
There’s a NaN
value in the Series, so the output is True
.
Filtering Series Generates NaN
Problem: When filtering a Series with where()
and no element passes the filtering condition, what’s the result?
import pandas as pd xs = pd.Series([5, 1, 4, 2, 3]) xs.where(xs > 2, inplace=True) result = xs.hasnans print(result) # True
The method where()
filters a Series by a condition. Only the elements that satisfy the condition remain in the resulting Series. And what happens if a value doesn’t satisfy the condition? Per default, all rows not satisfying the condition are filled with NaN
-values.
This is why our Series contains NaN
-values after filtering it with the method where()
.
Working with Multiple Series of Different Lengths
Problem: If you element-wise add two Series objects with a different number of elements—what happens with the remaining elements?
import pandas as pd s = pd.Series(range(0, 10)) t = pd.Series(range(0, 20)) result = (s + t)[1] print(result) # 2
To add two Series element-wise, use the default addition operator +
. The Series do not need to have the same size because once the first Series ends, the subsequent element-wise results are NaN
values.
At index 1
in the resulting Series, you get the result of 1 + 1 = 2
.
Create a DataFrame From a List of Dictionaries with Unequal Keys
Problem: How to create a DataFrame from a list of dictionaries if the dictionaries have unequal keys? A DataFrame expects the same columns to be available for each row!
import pandas as pd data = [{'Car':'Mercedes', 'Driver':'Hamilton, Lewis'}, {'Car':'Ferrari', 'Driver':'Schumacher, Michael'}, {'Car':'Lamborghini'}] df = pd.DataFrame(data, index=['Rank 2', 'Rank 1', 'Rank 3']) df.sort_index(inplace=True) result = df['Car'].iloc[0] print(result) # Ferrari
You can create a DataFrame from a list of dictionaries. The dictionaries’ keys define the column labels, and the values define the columns’ entries. Not all dictionaries must contain the same keys. If a dictionary doesn’t contain a particular key, this will be interpreted as a NaN
-value.
This code snippet uses string labels as index values to sort the DataFrame. After sorting the DataFrame, the row with index label Rank 1
is at location 0
in the DataFrame and the value in the column Car
is Ferrari
.
Sorting a DataFrame by Column with NaN Values
Problem: What happens if you sort a DataFrame by column if the column contains a NaN
value?
import pandas as pd df = pd.read_csv("Cars.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- selection = df.sort_values(by="engine-size") result = selection.index.to_list()[0] print(result) # 1
In this code snippet, you sort the rows of the DataFrame by the values of the column engine-size
.
The main point is that NaN
values are always moved to the end in Pandas sorting. Thus, the first value is 1.8
, which belongs to the row with index value 1
.
Count Non-NaN Values
Problem: How to count the number of elements in a dataframe column that are not Nan
?
import pandas as pd df = pd.read_csv("Cars.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- df.count()[5] print(result) # 4
The method count()
returns the number of non-NaN
values for each column. The DataFrame df
has five rows. The fifth column
contains one NaN
value. Therefore, the count of the fifth column is 4
.
Drop NaN-Values
Problem: How to drop all rows that contain a NaN
value in any of its columns—and how to restrict this to certain columns?
import pandas as pd df = pd.read_csv("Cars.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- selection1 = df.dropna(subset=["price"]) selection2 = df.dropna() print(len(selection1), len(selection2)) # 5 4
The DataFrame’s dropna()
method drops all rows that contain a NaN
value in any of its columns. But how to restrict the columns to be scanned for NaN
values?
By passing a list of column labels to the optional parameter subset
, you can define which columns you want to consider.
The call of dropna()
without restriction, drops line 2
because of the NaN
value in the column engine-size
. When you restrict the columns only to price
, no rows will be dropped, because no NaN
value is present.
Drop Nan and Reset Index
Problem: What happens to indices after dropping certain rows?
import pandas as pd df = pd.read_csv("Cars.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- df.drop([0, 1, 2], inplace=True) df.reset_index(inplace=True) result = df.index.to_list() print(result) # [0, 1]
The method drop()
on a DataFrame deletes rows or columns by index. You can either pass a single value or a list of values.
By default the inplace
parameter is set to False
, so that modifications won’t affect the initial DataFrame object. Instead, the method returns a modified copy of the DataFrame. In the puzzle, you set inplace
to True
, so the deletions are performed directly on the DataFrame.
After deleting the first three rows, the first two index labels are 3 and 4. You can reset the default indexing by calling the method reset_index()
on the DataFrame, so that the index starts at 0 again. As there are only two rows left in the DataFrame, the result is [0, 1]
.
Concatenation of Dissimilar DataFrames Filled With NaN
Problem: How to concatenate two DataFrames if they have different columns?
import pandas as pd df = pd.read_csv("Cars.csv") df2 = pd.read_csv("Cars2.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- # Additional Dataframe "df2" # ---------- # make origin # 0 skoda Czechia # 1 toyota Japan # 2 ford USA # ---------- try: result = pd.concat([df, df2], axis=0, ignore_index=True) print("Y") except Exception: print ("N") # Y
Even if DataFrames have different columns, you can concatenate them.
If DataFrame 1 has columns A and B and DataFrame 2 has columns C and D, the result of concatenating DataFrames 1 and 2 is a DataFrame with columns A, B, C, and D. Missing values in the rows are filled with NaN
.
Outer Merge
Problem: When merging (=joining) two DataFrames—what happens if there are missing values?
import pandas as pd df = pd.read_csv("Cars.csv") df2 = pd.read_csv("Cars2.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- # Additional dataframe "df2" # ---------- # make origin # 0 skoda Czechia # 1 mazda Japan # 2 ford USA # ---------- result = pd.merge(df, df2, how="outer", left_on="make", right_on="make") print(len(result["fuel"])) print(result["fuel"].count()) # 7 # 5
With Panda’s function merge()
and the parameter how
set to outer
, you can perform an outer join.
The resulting DataFrame of an outer join contains all values from both input DataFrames; missing values are filled with NaN
.
In addition, this puzzle shows how NaN
values are counted by the len()
function whereas the method count()
does not include NaN
values.
Replacing NaN
Problem: How to Replace all NaN
values in a DataFrame with a given value?
import pandas as pd df = pd.read_csv("Cars.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- df.fillna(2.0, inplace=True) result = df["engine-size"].sum() print(result) # 13.8
The method fillna()
replaces NaN
values with a new value. Thus, the sum of all values in the column engine-size
is 13.8.
Length vs. Count Difference — It’s NaN!
Problem: What’s the difference between the len()
and the count()
functions?
import pandas as pd df = pd.read_csv("Cars.csv") df2 = pd.read_csv("Cars2.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- # Additional dataframe "df2" # ---------- # make origin # 0 skoda Czechia # 1 mazda Japan # 2 ford USA # ---------- result = pd.merge(df2, df, how="left", left_on="make", right_on="make") print(len(result["fuel"])) print(result["fuel"].count()) # 3 # 1
In a left join, the left DataFrame is the master, and all its values are included in the resulting DataFrame.
Therefore, the result DataFrame contains three rows, yet, since skoda
and ford
don’t appear in DataFrame df
, only one the row for mazda
contains value.
Again, we see the difference between using the function len()
which also includes NaN
values and the method count()
which does not count NaN
values.
Equals() vs. == When Comparing NaN
Problem:
import pandas as pd df = pd.read_csv("Cars.csv") # Dataframe "df" # ---------- # make fuel aspiration body-style price engine-size # 0 audi gas turbo sedan 30000 2.0 # 1 dodge gas std sedan 17000 1.8 # 2 mazda diesel std sedan 17000 NaN # 3 porsche gas turbo convertible 120000 6.0 # 4 volvo diesel std sedan 25000 2.0 # ---------- df["engine-size_copy"] = df["engine-size"] check1 = (df["engine-size_copy"] == df["engine-size"]).all() check2 = df["engine-size_copy"].equals(df["engine-size"]) print(check1 == check2) # False
This code snippet shows how to compare columns or entire DataFrames regarding the shape and the elements.
The comparison using the operator ==
returns False
for our DataFrame because the comparing NaN
-values with ==
always yields False
.
On the other hand, df.equals()
allows comparing two Series or DataFrames. In this case, NaN
-values in the same location are considered to be equal.
The column headers do not need to have the same type, but the elements within the columns must be of the same dtype
.
Since the result of check1
is False
and the result of check2
yields True
, the final output is False
.
Where to Go From Here?
Enough theory. Let’s get some practice!
Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.
To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?
You build high-value coding skills by working on practical coding projects!
Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?
🚀 If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.
If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.