How to Get the Last N Rows of a Pandas DataFrame?

5/5 - (4 votes)

In this tutorial we will unearth the solutions to three commonly asked questions that users come across while dealing with huge sets of data.

Problem Formulation

Given: Consider the following csv file (Note: You need to use it as a Pandas DataFrame).

import pandas as pd

df = pd.read_csv('countries.csv')
print(df)
    Country     Capital     Population      Area
0    Germany     Berlin     84,267,549      348,560
1    France      Paris      65,534,239      547,557
2    Spain       Madrid     46,787,468      498,800
3    Italy       Rome       60,301,346      294,140
4    India       Delhi      1,404,495,187   2,973,190
5    USA       Washington   334,506,463     9,147,420
6    China      Beijing     1,449,357,022   9,388,211
7    Poland      Warsaw     37,771,789      306,230
8    Russia      Moscow     146,047,418     16,376,870
9    England     London     68,529,747      241,930

Here’s the list of the questions that we will be focusing upon in this article:

  • How to get the last N rows of a Pandas DataFrame?
  • How to get last N rows from last N columns of a Pandas DataFrame?
  • How to read last N rows of a large csv file in Pandas?

Recommended Read: How to Select Rows From a DataFrame Based on Column Values?

Without further delay, let us dive into the solutions to the first question and learn how to get the last N rows of a Pandas DataFrame.

Method 1: Using iloc

Approach: Use the iloc property as pandas.DataFrame.iloc[-n:].

The iloc property is used to get or set the values of specified indices. Select the last n rows using the square bracket notation syntax [-n:] with the iloc property. Here, -n represents the index of the last n rows of the given pandas DataFrame.

Code:

import pandas as pd

df = pd.read_csv('countries.csv')
rows = df.iloc[-5:]
print(rows)

Output:

      Country     Capital       Population      Area
5     USA         Washington    334,506,463     9,147,420
6     China       Beijing       1,449,357,022   9,388,211
7     Poland      Warsaw        37,771,789      306,230
8     Russia      Moscow        146,047,418     16,376,870
9     England     London        68,529,747      241,930

Method 2: Using tail()

Approach: Use the pandas.DataFrame.tail(n) to select the last n rows of the given DataFrame.

The tail(n) method returns n number of methods from the bottom end of the DataFrame. Here, n represents an integer that denotes the number of rows you want to fetch from the bottom end of the DataFrame.

Code:

import pandas as pd

df = pd.read_csv('countries.csv')
rows = df.tail(5)
print(rows)

Output:

      Country     Capital       Population      Area
5     USA         Washington    334,506,463     9,147,420
6     China       Beijing       1,449,357,022   9,388,211
7     Poland      Warsaw        37,771,789      306,230
8     Russia      Moscow        146,047,418     16,376,870
9     England     London        68,529,747      241,930

Well, that brings us to the next question in line – “How to get the last N rows from last N columns of a Pandas DataFrame?”

Method 1: Integer Based Indexing

Approach: Call pandas.DataFrame.iloc[-n:, -m:] to display last n rows from the last m columns of the given DataFrame.

Code: In the following code snippet we will fetch the last 5 rows from the last 2 columns, i.e., Population and Area.

import pandas as pd

df = pd.read_csv('countries.csv')
rows = df.iloc[-5:, -2:]
print(rows)

Output:

     Population         Area
5    334,506,463        9,147,420
6    1,449,357,022      9,388,211
7    37,771,789         306,230
8    146,047,418        16,376,870
9    68,529,747         241,930

Method 2: Name Based Indexing

In case, you happen to know the names of the specific columns and you want to get the last N records from the DataFrame from those columns then you can follow a two step process.

  • Call the Pandas.DataFrame.loc(:, 'start_column_name':'end_column_name') selector. It allows you to use slicing on column names instead of integer identifiers which can be more comfortable.
  • .loc is for label based indexing. Hence, the negative indices are not found and reindexed to NaN. Thus, to deal with this you have to use the tail() method to extract the last N records from the selected columns.

Code: The following code snippet shows how you can use the column names and fetch the corresponding values from the last 5 rows of the given Dataframe.

import pandas as pd

df = pd.read_csv('countries.csv')
rows = df.loc[:, 'Population':'Area']
print(rows.tail(5))

Output:

     Population         Area
5    334,506,463        9,147,420
6    1,449,357,022      9,388,211
7    37,771,789         306,230
8    146,047,418        16,376,870
9    68,529,747         241,930

Last but not least, let us solve the third and final problem of today’s tutorial – “How to read last N rows of a large csv file in Pandas?

Unfortunately, read_csv() does not facilitate us with any parameter that allows you to directly read the last N lines from a file. This can be a troublesome issue to handle when you are dealing with large datasets.

Thus, a workaround to this problem is to first find out the total number of lines/records in the file. Then use the skiprows parameter to directly jump to the row/line from which you want to select the records.

Code: In the following code snippet we will fetch the first 5 rows from the csv file into our DataFrame.

import pandas as pd


def num_of_lines(fname):
    with open(fname) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1


num_lines = num_of_lines("countries.csv")
n = 5
df = pd.read_csv("countries.csv", skiprows=range(1, num_lines - n))
print(df)

Output:

     Country     Capital       Population        Area
0    USA         Washington    334,506,463       9,147,420
1    China       Beijing       1,449,357,022     9,388,211
2    Poland      Warsaw        37,771,789        306,230
3    Russia      Moscow        146,047,418       16,376,870
4    England     London        68,529,747        241,930

Conclusion

Phew! We have successfully solved all the problems that were presented to us in this tutorial.  I hope this tutorial helped you to sharpen your coding skills. Please stay tuned and subscribe for more interesting coding problems.

Recommended Reads:


Learn Pandas the Fun Way by Solving Code Puzzles

If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).

Coffee Break Pandas Book

It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?

Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.