5 Best Ways to Count Rows and Columns in a Pandas DataFrame

πŸ’‘ Problem Formulation: When working with data in Python, it’s crucial to quickly assess the structure of your DataFrame. Whether you’re pre-processing data or ensuring data quality, knowing the number of rows and columns can guide your next steps. Suppose you have a DataFrame df and want to determine its dimensions; specifically, you’re looking for output resembling (number_of_rows, number_of_columns).

Method 1: Using shape Attribute

The shape attribute of a Pandas DataFrame returns a tuple representing the dimensions of the DataFrame, with the first element being the number of rows and the second the number of columns. This method is direct and efficient, especially when you need both dimensions quickly.

Here’s an example:

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df.shape)

Output:

(3, 2)

This example creates a simple DataFrame with 3 rows and 2 columns, then prints the dimensions using the shape attribute, which outputs the tuple (3, 2), indicating the DataFrame has 3 rows and 2 columns.

Method 2: Using len() and shape

Another way to count the rows in a DataFrame is to use the built-in Python function len() in combination with the shape attribute. This will give you the number of rows specifically. For columns, you can access the second element of the shape tuple.

Here’s an example:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
num_rows = len(df)
num_columns = df.shape[1]
print("Rows:", num_rows)
print("Columns:", num_columns)

Output:

Rows: 3
Columns: 2

In this snippet, we count the number of rows using len(df), which gives us the first dimension of the DataFrame structure. The number of columns is obtained by accessing the second element of the shape tuple with df.shape[1].

Method 3: Using len() with df.columns and df.index

The length of df.columns and df.index properties can also be used to determine the number of columns and rows, respectively. This approach uses Pandas’ built-in properties tailored for columns and index (rows).

Here’s an example:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
rows_count = len(df.index)
columns_count = len(df.columns)
print("Rows:", rows_count)
print("Columns:", columns_count)

Output:

Rows: 3
Columns: 2

This code counts the number of rows by determining the length of the DataFrame’s index, and similarly, the number of columns by counting the DataFrame’s columns. It’s straightforward and clearly communicates the intent of the operation.

Method 4: Using DataFrame’s count() Method

The count() method in Pandas returns the number of non-NA/null observations across the given axis. By default, it counts along the rows for each column, but with axis=0 or axis=1, you can count rows or columns specifically.

Here’s an example:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, None], 'B': [None, 5, 6]})
rows_count = df.count(axis=1)
columns_count = df.count()
print("Rows with non-NA values per column:\n", columns_count)
print("Columns with non-NA values per row:\n", rows_count)

Output:

Rows with non-NA values per column:
 A    2
B    2
dtype: int64
Columns with non-NA values per row:
 0    1
1    2
2    1
dtype: int64

This code snippet demonstrates how to count non-NA values per column and per row. It’s slightly more complex than the previous methods because the result gives a series rather than a single number, providing more detailed information about the DataFrame.

Bonus One-Liner Method 5: Using DataFrame’s size Attribute

The size attribute returns the total number of elements in the DataFrame, which is the product of the number of rows and columns. This single number can be useful for quick size checks.

Here’s an example:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
total_elements = df.size
print("Total elements (rows * columns):", total_elements)

Output:

Total elements (rows * columns): 6

This code simply invokes the size attribute of the DataFrame to return the total number of elements. While not directly giving the rows or columns count, it’s a quick way to check the DataFrame’s size.

Summary/Discussion

  • Method 1: Using shape. Quick and easy. Provides both row and column counts directly in a tuple.
  • Method 2: Using len() with shape. Gives individual counts explicitly for rows and columns. Requires two separate calls.
  • Method 3: Using len() with df.columns and df.index. Semantically clear as it directly accesses rows and columns properties. Not as concise as the shape attribute.
  • Method 4: Using DataFrame’s count() Method. Counts non-NA cells, providing more detail but not the overall structure.
  • Bonus Method 5: Using size. Offers a quick one-number summary of DataFrame size. Might require additional calculation to get individual dimensions.