5 Best Ways to Display Specific Number of Rows from a Pandas DataFrame

πŸ’‘ Problem Formulation: When you’re working with large data sets in Python’s Pandas library, you may often need to inspect a subset of your DataFrame. Whether it’s for a quick check or for detailed analysis, knowing how to efficiently display a specified number of rows is a fundamental skill. This article demonstrates how to accomplish this, aiming to take an input DataFrame and display, for instance, the first five rows as the desired output.

Method 1: Using head()

One of the most straightforward methods to display the first N rows of a DataFrame is by using the head() function, which returns the first N rows for the object based on position. By default, it returns the first five rows if no parameter is provided.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
# Display the first 3 rows
print(df.head(3))

Output:

   A  B
0  1  5
1  2  4
2  3  3

This code snippet creates a DataFrame with columns ‘A’ and ‘B’, then uses the head() function to display the first three rows. This method is popular due to its simplicity and ease of use.

Method 2: Using tail()

To display the last N rows of a DataFrame, the tail() function is a perfect tool. Similar to head(), it returns the last five rows by default if no argument is specified.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'X': ['A', 'B', 'C', 'D', 'E'], 'Y': [1, 2, 3, 4, 5]})
# Display the last two rows
print(df.tail(2))

Output:

   X  Y
3  D  4
4  E  5

In this example, the tail() function is called on the DataFrame to retrieve and print the last two rows. This method is as intuitive as the first one, giving users a fast way to access the bottom rows of their data.

Method 3: Using Slicing with iloc[]

For a more customizable approach, slicing with the iloc[] indexer allows specific rows to be selected by position. The iloc[] indexer syntax is similar to slicing lists in Python.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'One': range(10), 'Two': range(10, 20)})
# Display rows 2 through 4
print(df.iloc[2:5])

Output:

   One  Two
2    2   12
3    3   13
4    4   14

The code above uses iloc[] to select and display rows 2 to 4 (inclusive of index 2 and exclusive of index 5). Slicing with iloc[] is a fundamental technique that provides robust row selection capabilities.

Method 4: Using iloc[] with a Step Parameter

Another powerful feature of iloc[] is the ability to include a step parameter in the slice, which can be used to select alternate rows or apply more complex slicing logic.

Here’s an example:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'Alpha': list('abcdefghij'), 'Num': range(10)})
# Display every other row, starting from the first
print(df.iloc[::2])

Output:

  Alpha  Num
0     a    0
2     c    2
4     e    4
6     g    6
8     i    8

Here, iloc[::2] is used to specify that every other row starting from the first row should be returned. This method is useful when you need to inspect a pattern or subset of rows at regular intervals.

Bonus One-Liner Method 5: Using nrows in read_csv()

As a bonus, if the dataset is directly being read from a CSV file, the read_csv() function’s nrows parameter can be used to read a specific number of rows from the file into a DataFrame.

Here’s an example:

import pandas as pd

# Read the first 4 rows of the CSV file into a DataFrame
df = pd.read_csv('sample.csv', nrows=4)
print(df)

This output depends on the contents of ‘sample.csv’, but it will display the first four rows of that file as a DataFrame.

In this efficient one-liner, the reading process is limited to the first four rows of the CSV file. This technique can save memory and time when working with very large datasets.

Summary/Discussion

  • Method 1: head(). Strengths: Easy to remember and quick to use. Weaknesses: Only retrieves rows from the start of the DataFrame.
  • Method 2: tail(). Strengths: Symmetrical to head(), provides a quick look at the end of the DataFrame. Weaknesses: Only retrieves rows from the end of the DataFrame.
  • Method 3: Slicing with iloc[]. Strengths: Highly customizable and familiar to those who have worked with Python’s slicing syntax. Weaknesses: Can become complex with nonsequential row access.
  • Method 4: iloc[] with step parameter. Strengths: Allows for regular interval row selection. Weaknesses: The use of a step may omit rows that hold important data (if not used carefully).
  • Bonus Method 5: nrows in read_csv(). Strengths: Can improve performance by limiting data read into memory. Weaknesses: Applicable only when reading data from a CSV file.