How to Count Occurrences of Elements in Pandas? 

Rate this post

When we’re working on real-world data, the data files that we receive will be huge. In such cases, cleansing the data becomes important. Often, we might have to count the occurrences of elements in a column. Doing so, gives us insights like:

  • The number of times the value has been duplicated.
  • Count corresponding to a particular value.
  • Checking the number of missing values from the input.

Consider a data frame as shown below.

data = {'Name' : ['Alice','Bob','Steve','George','Bob','Steve','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5]}

We can count the occurrence of the elements in a column to get some insights into the data.

For example, to find the count of duplicates of the candidates, or to find the number of male and female candidates or to find the number of missing rank values, etc.

In this article, let’s learn different methods to find the count of occurrences of the elements in a column. Without any further delay, let’s get started.

Method 1: Using dataframe.value_counts()

We can use the dataframe.value_counts() method to find the count of occurrences of the elements in a column.

Just say,

dataframe['ColumnName'].value_counts()

If you want to find the count of occurrences of a combination of columns use the following,

dataframe[['ColumnName1','ColumnName2',....,'ColumnNameN'].value_counts()

Example: Let’s say we have to find the count of male and female candidates from the above dataframe. We can use dataframe.count_values() as shown below.

import numpy as np
import pandas as pd


data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)

print(df['Gender'].value_counts())

Output:

M    5
F    2
Name: Gender, dtype: int64

As we can see, it gives us the count of male and female candidates. On careful examination, we can notice that the count of nan values is not shown.

By default, dataframe.value_counts() does not specify the missing values. To specify the missing values, we have to pass a parameter dropna=False as shown below.

dataframe.value_counts(dropna=False)

Let’s use this in our example and see what happens.

import numpy as np
import pandas as pd


data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)

print(df['Gender'].value_counts(dropna=False))

Output:

M      5
F      2
NaN    1
Name: Gender, dtype: int64

As seen in the output, the count of missing values is displayed.

We can also find the count of occurrences from the combination of columns as shown below.

import numpy as np
import pandas as pd

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5],
       'Value': [1,2,3,1,2,3,1,2]}
df=pd.DataFrame(data)
print(df[['Name','Rank']].value_counts())

Output:

Name     Rank
Alice    3.0     1
Bob      4.0     1
Bounty   5.0     1
George   4.0     1
Scottie  5.0     1
Sitara   6.0     1

Method 2: Using dataframe.groupby()

We can also use the methods dataframe.groupby().size() or dataframe.groupby().count() to find the count of occurrences of the Column Values using the below syntax.

dataframe.groupby('ColumnName').size()

or

dataframe.groupby('ColumnName').count()

If you count the ocurrences of the group of multiple columns use,

dataframe.groupby(['ColumnName1','ColumnName2',......'ColumnNameN']).size()

or

dataframe.groupby(['ColumnName1','ColumnName2',......'ColumnNameN']).count()

Example: Let’s say we have to find the count of occurrences of the name. We can use dataframe.groupby().size()  as shown below

import numpy as np
import pandas as pd


data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)
print(df.groupby('Name').size())

Output:

Name
Alice      1
Bob        1
Bounty     1
George     2
Scottie    1
Sitara     1
Steve      1
dtype: int64

Alternatively, we can also use dataframe.groupby().count() as shown below.

import numpy as np
import pandas as pd


data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)
print(df.groupby('Name').count())

Output:

         Gender  Rank
Name                 
Alice         1     1
Bob           1     1
Bounty        0     1
George        2     1
Scottie       1     1
Sitara        1     1
Steve         1     0

We can also find the count of occurrences from the combination of columns using groupby() as shown below.

import numpy as np
import pandas as pd
data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5],
       }
df=pd.DataFrame(data)
print(df.groupby(['Name','Rank']).count())

Output:

             Gender
Name    Rank        
Alice   3.0        1
Bob     4.0        1
Bounty  5.0        0
George  4.0        1
Scottie 5.0        1
Sitara  6.0        1

Measuring the time complexity of these methods

Now that we’ve seen different ways of finding the count of occurrences of the Col elements, let’s have a look at the execution times using different methods.

import numpy as np
import pandas as pd
from time import perf_counter_ns

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)

# Using groupby.size()
start=perf_counter_ns()
df.groupby('Name').size()
end=perf_counter_ns()
print("Time taken uisng groupby().size()  : %7dns"%(end-start))

# Using groupby().count()
start=perf_counter_ns()
df.groupby('Name').count()
end=perf_counter_ns()
print("Time taken uisng groupby().count() : %7dns"%(end-start))

# Using value_counts()
start=perf_counter_ns()
df['Name'].value_counts()
end=perf_counter_ns()
print("Time taken uisng value_counts()    : %7dns"%(end-start))

Output:

Time taken using groupby().size()   :  820000ns
Time taken using groupby().count() :  666000ns
Time taken using value_counts()     :  235800ns

When we’re finding the count of occurrences of one column, value_counts() is faster.

However, when we’re finding the count of occurrences from a combination of several columns, groupby() methods are faster. Refer to the below code for more details.

import numpy as np
import pandas as pd
from time import perf_counter_ns

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
       'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
       'Rank': [3,4,np.nan,4,5,6,np.nan,5],
       }
df=pd.DataFrame(data)

# Using groupby.size()
start=perf_counter_ns()
df.groupby(['Name','Rank']).size()
end=perf_counter_ns()
print("Time taken uisng groupby().size()  : %7dns"%(end-start))

# Using groupby().count()
start=perf_counter_ns()
df.groupby(['Name','Rank']).count()
end=perf_counter_ns()
print("Time taken uisng groupby().count() : %7dns"%(end-start))

# Using value_counts()
start=perf_counter_ns()
df[['Name','Rank','Gender']].value_counts()
end=perf_counter_ns()
print("Time taken uisng value_counts()    : %7dns"%(end-start))

Output:

Time taken uisng groupby().size()  :  544800ns
Time taken uisng groupby().count() :  653100ns
Time taken uisng value_counts()    :  925300ns

Conclusion

That brings us to the end of this short tutorial. We have seen different methods to count the occurrences of the values in a column. We have also checked the time complexities of these methods. 

We hope this has been informative. Thanks for reading. Do you have any questions? Feel free to comment and let us know. For more interesting content, subscribe to our email newsletter: