How to Count Occurrences of Elements in Pandas?Β

3/5 - (1 vote)

When weβre working on real-world data, the data files that we receive will be huge. In such cases, cleansing the data becomes important. Often, we might have to count the occurrences of elements in a column. Doing so, gives us insights like:

• The number of times the value has been duplicated.
• Count corresponding to a particular value.
• Checking the number of missing values from the input.

Consider a data frame as shown below.

```data = {'Name' : ['Alice','Bob','Steve','George','Bob','Steve','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5]}```

We can count the occurrence of the elements in a column to get some insights into the data.

For example, to find the count of duplicates of the candidates, or to find the number of male and female candidates or to find the number of missing rank values, etc.

In this article, let’s learn different methods to find the count of occurrences of the elements in a column. Without any further delay, let’s get started.

Method 1: Using dataframe.value_counts()

We can use the `dataframe.value_counts()` method to find the count of occurrences of the elements in a column.

Just say,

`dataframe['ColumnName'].value_counts()`

If you want to find the count of occurrences of a combination of columns use the following,

`dataframe[['ColumnName1','ColumnName2',....,'ColumnNameN'].value_counts()`

Example: Let’s say we have to find the count of male and female candidates from the above dataframe. We can use `dataframe.count_values()` as shown below.

```import numpy as np
import pandas as pd

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)

print(df['Gender'].value_counts())
```

Output:

```M    5
F    2
Name: Gender, dtype: int64```

As we can see, it gives us the count of male and female candidates. On careful examination, we can notice that the count of nan values is not shown.

By default, `dataframe.value_counts()` does not specify the missing values. To specify the missing values, we have to pass a parameter `dropna=False` as shown below.

`dataframe.value_counts(dropna=False)`

Letβs use this in our example and see what happens.

```import numpy as np
import pandas as pd

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)

print(df['Gender'].value_counts(dropna=False))
```

Output:

```M      5
F      2
NaN    1
Name: Gender, dtype: int64
```

As seen in the output, the count of missing values is displayed.

We can also find the count of occurrences from the combination of columns as shown below.

```import numpy as np
import pandas as pd

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5],
'Value': [1,2,3,1,2,3,1,2]}
df=pd.DataFrame(data)
print(df[['Name','Rank']].value_counts())```

Output:

```Name     Rank
Alice    3.0     1
Bob      4.0     1
Bounty   5.0     1
George   4.0     1
Scottie  5.0     1
Sitara   6.0     1
```

Method 2: Using dataframe.groupby()

We can also use the methods `dataframe.groupby().size()` or `dataframe.groupby().count()` to find the count of occurrences of the Column Values using the below syntax.

`dataframe.groupby('ColumnName').size()`

or

`dataframe.groupby('ColumnName').count()`

If you count the ocurrences of the group of multiple columns use,

`dataframe.groupby(['ColumnName1','ColumnName2',......'ColumnNameN']).size()`

or

`dataframe.groupby(['ColumnName1','ColumnName2',......'ColumnNameN']).count()`

Example: Letβs say we have to find the count of occurrences of the name. We can use `dataframe.groupby().size()`  as shown below

```import numpy as np
import pandas as pd

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)
print(df.groupby('Name').size())
```

Output:

```Name
Alice      1
Bob        1
Bounty     1
George     2
Scottie    1
Sitara     1
Steve      1
dtype: int64
```

Alternatively, we can also use `dataframe.groupby().count()` as shown below.

```import numpy as np
import pandas as pd

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)
print(df.groupby('Name').count())```

Output:

```         Gender  Rank
Name
Alice         1     1
Bob           1     1
Bounty        0     1
George        2     1
Scottie       1     1
Sitara        1     1
Steve         1     0
```

We can also find the count of occurrences from the combination of columns using `groupby()` as shown below.

```import numpy as np
import pandas as pd
data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5],
}
df=pd.DataFrame(data)
print(df.groupby(['Name','Rank']).count())```

Output:

```             Gender
Name    Rank
Alice   3.0        1
Bob     4.0        1
Bounty  5.0        0
George  4.0        1
Scottie 5.0        1
Sitara  6.0        1
```

Measuring the time complexity of these methods

Now that weβve seen different ways of finding the count of occurrences of the Col elements, let’s have a look at the execution times using different methods.

```import numpy as np
import pandas as pd
from time import perf_counter_ns

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5]}
df=pd.DataFrame(data)

# Using groupby.size()
start=perf_counter_ns()
df.groupby('Name').size()
end=perf_counter_ns()
print("Time taken uisng groupby().size()  : %7dns"%(end-start))

# Using groupby().count()
start=perf_counter_ns()
df.groupby('Name').count()
end=perf_counter_ns()
print("Time taken uisng groupby().count() : %7dns"%(end-start))

# Using value_counts()
start=perf_counter_ns()
df['Name'].value_counts()
end=perf_counter_ns()
print("Time taken uisng value_counts()    : %7dns"%(end-start))```

Output:

```Time taken using groupby().size()   :  820000ns
Time taken using groupby().count() :  666000ns
Time taken using value_counts()     :  235800ns```

When weβre finding the count of occurrences of one column, `value_counts()` is faster.

However, when weβre finding the count of occurrences from a combination of several columns, `groupby()` methods are faster. Refer to the below code for more details.

```import numpy as np
import pandas as pd
from time import perf_counter_ns

data = {'Name' : ['Alice','Bob','Steve','George','Bounty','Sitara','George','Scottie'],
'Gender' : ['F','M','M','M',np.nan,'M','M','F'],
'Rank': [3,4,np.nan,4,5,6,np.nan,5],
}
df=pd.DataFrame(data)

# Using groupby.size()
start=perf_counter_ns()
df.groupby(['Name','Rank']).size()
end=perf_counter_ns()
print("Time taken uisng groupby().size()  : %7dns"%(end-start))

# Using groupby().count()
start=perf_counter_ns()
df.groupby(['Name','Rank']).count()
end=perf_counter_ns()
print("Time taken uisng groupby().count() : %7dns"%(end-start))

# Using value_counts()
start=perf_counter_ns()
df[['Name','Rank','Gender']].value_counts()
end=perf_counter_ns()
print("Time taken uisng value_counts()    : %7dns"%(end-start))```

Output:

```Time taken uisng groupby().size()  :  544800ns
Time taken uisng groupby().count() :  653100ns
Time taken uisng value_counts()    :  925300ns```

Conclusion

That brings us to the end of this short tutorial. We have seen different methods to count the occurrences of the values in a column. We have also checked the time complexities of these methods.

We hope this has been informative. Thanks for reading. Do you have any questions? Feel free to comment and let us know. For more interesting content, subscribe to our email newsletter: