In this tutorial, we are covering the Pandas functions loc()
and iloc()
which are used for data selection operations on dataframes. By using the loc()
function, we access a group of rows and/or columns based on their respective labels, whereas the iloc()
function is an integer-location-based way to access these groups.
Getting values with the loc() function
Here are the allowed inputs as stated in the official documentation:
- A single label, e.g. 5 or
'a'
, (note that 5 is interpreted as a label of the index, and never as an integer position along the index). - A list or array of labels, e.g.
['a', 'b', 'c']
. - A slice object with labels, e.g.
'a':'f'
. - A Boolean array of the same length as the axis being sliced, e.g.,
[True, False, True]
. - An alignable Boolean Series. The index of the key will be aligned before masking.
- An alignable Index. The Index of the returned selection will be the input.
- A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)
At first, we will have a look at the loc()
function. Therefore, we are using the following example:
import pandas as pd data = { "speed": [7,5,8], "height": [1.0, 0.3, 0.1], "length": [1.2, 0.4, 0.2] } df = pd.DataFrame(data, index=["dog", "cat", "fish"]) df
The output DataFrame looks like so:
speed | height | length | |
dog | 7 | 1.0 | 1.2 |
cat | 5 | 0.3 | 0.4 |
fish | 8 | 0.1 | 0.2 |
First, we import the Pandas library. Next, we assign the example data as a dictionary of lists to a variable called “data
”. Then, we create a Pandas dataframe using the data from the “data
” variable and assign this new dataframe to a variable called “df
”. Finally, we output the dataframe. The dataframe shows a dog’s, cat’s, and fish’s respective speed, height, and length.
Now, we apply the loc()
function. There are lots of ways to use the loc()
function and how we use it, depends on what information we want to get.
Let’s say we want to get the values for a specific row:
df.loc['fish']
speed | 8.0 |
height | 0.1 |
length | 0.2 |
Name: fish, dtype: float64 |
Here, we want to get the values for the “fish
” row. We achieve that by applying the row name “fish
” into square brackets inside the loc()
function. The output shows each column value for the “fish
” row. Additionally, we get the name of the row and the data type of the values. As we can see, the data type here is “float64
”. If we have a look at the initial dataframe, we see that the “speed
” column has the data type “int64
”, because all values are integers. But since integers are easily transferable into float values (the integer “8” for example turns into the float “8.0”), the data type here says “float64
”.
Similar to the example from above, we can get the values from multiple rows as well:
df.loc[['dog', 'fish']]
The resulting DataFrame looks like this:
speed | height | length | |
dog | 7 | 1.0 | 1.2 |
fish | 8 | 0.1 | 0.2 |
We put the names of the relevant rows into the loc
function, in this case, the “dog
” and the “fish
” row. Note that we use double square brackets here since we need to apply the rows as a list into the loc()
function. The output shows a new dataframe and we do not get the row names and the data types as opposed to the example where we use the loc()
method with only one pair of square brackets.
So by now, we have seen how to use the loc()
method to access values from specific rows. The same is possible for columns as well:
df.loc[:, 'height']
dog | 1.0 |
cat | 0.3 |
fish | 0.1 |
Name: height, dtype: float64 |
Before the comma, we determine which rows we want to show. The “:”
says we want to get all row values. After the comma, we say which column or columns we want to show. Here, it’s just “height
”. The output shows all “height
” values for each row (“dog
”, “cat
”, and “fish
”).
If we look for one specific value, for example, the speed of a fish, we do it this way:
df.loc['fish', 'speed']
The approach is the same as in the example before: Before the comma, we state the row, after the comma, we state the column. By applying only one row and one column, we get the value of that specific cell which is “8” in this case.
Besides the use cases we have seen above, the loc()
method provides us with even more possibilities. We can apply a condition to the loc
function to see only the data that meets this condition:
df.loc[df['speed']>6]
The output:
speed | height | length | |
dog | 7 | 1.0 | 1.2 |
fish | 8 | 0.1 | 0.2 |
Inside the square brackets, we put in a condition. The code means we want to access all the rows that have a speed greater than 6. The output shows a dataframe containing the “dog” and “fish” rows only. The “cat” row is missing because the cat’s speed is only 5.
Using conditionals within the loc()
function is very powerful since it’s an easy and efficient way to get a lot of information out of our data with a very limited amount of code.
Setting values with the loc() function
Except for getting certain values with the loc()
method, we are also able to set values using this operation. For example, we might want to change the values for an entire row applying the loc()
function:
df.loc["dog"] = 9
The resulting DataFrame:
speed | height | length | |
dog | 9 | 9.0 | 9.0 |
cat | 5 | 0.3 | 0.4 |
fish | 8 | 0.1 | 0.2 |
We put the row name inside the loc()
function and equal that to “9”. This way, we set all values in that row to “9”. Note that Pandas automatically adapts the column’s data types. Although we apply the “9” as an integer, in the “speed” column the “9” actually is applied as an integer since the column’s data type is “int64”, whereas in the “height” and “length” column the “9” is turned into a float number because the column’s data type is “float64”. Also, observe that we need to call the new dataframe after we changed values. That’s because setting new values does not call the new dataframe.
Setting new values works in the same way as getting values. We just have to specify to which value we want to set the accessed group or groups. That being said, we can apply all the methods we have seen before in the “Getting values with the loc() function” for setting new values. To give you some more examples, let’s have a look at how to set values for an entire column:
df.loc[:, "speed"] = 11
Here’s the resulting DataFrame df
:
speed | height | length | |
dog | 11 | 9.0 | 9.0 |
cat | 11 | 0.3 | 0.4 |
fish | 11 | 0.1 | 0.2 |
The process here is the same as getting the values of a whole column. In addition to the getting process, we apply the value for the column. In this case, we change every entry in the “speed” column to “11” as we can see in the outputted dataframe.
Another example could be that we want to change values based on a condition:
df.loc[df["height"] < 1, "height"] = 4
speed | height | length | |
dog | 11 | 9.0 | 9.0 |
cat | 11 | 4.0 | 0.4 |
fish | 11 | 4.0 | 0.2 |
Here, we use conditionals like before in the “Getting values” section. We state that for every row in which the height is smaller than 1, we want to change the value in the respective “height” column to “4”. If we compare this new outputted dataframe to the one from the code example before, we see that indeed all “height” values smaller than 1 were turned into “4”. But since the “height” column’s data type is float, the “4” is converted to “4.0”.
Getting values with the iloc() function
Here are the allowed inputs as stated in the official documentation:
- An integer, e.g. 5.
- A list or array of integers, e.g.
[4, 3, 0]
. - A slice object with ints, e.g.
1:7
. - A Boolean array.
- A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
The iloc()
function gets rows and columns at integer locations. To see what that means, let’s continue with the example we have used before with the loc()
function:
df
speed | height | length | |
dog | 11 | 9.0 | 9.0 |
cat | 11 | 4.0 | 0.4 |
fish | 11 | 4.0 | 0.2 |
To access the first row, we use the iloc()
function like this:
df.iloc[0]
speed | 11.0 |
height | 9.0 |
length | 9.0 |
Name: dog, dtype: float64 |
Inside the iloc()
function we put in “0” to state that we want to get the first row. The output shows all entries from the “dog” row for each column respectively. We also get the name of the row and the data type which is “float64” in this case.
Just like the loc()
function, we can alternatively write this to get the first row as a dataframe output:
df.iloc[[0]]
speed | height | length | |
dog | 11 | 9.0 | 9.0 |
We put in a list containing all the rows we want to access into the iloc()
function. In this case, the list has only one entry, namely “0”.
Moreover, we are also able to access specific columns with the iloc()
method:
df.iloc[:, 2]
dog | 9.0 |
cat | 0.4 |
fish | 0.2 |
Name: length, dtype: float64 |
This approach works the same as with the loc()
function. Before the comma, we put int the “:
” to say we want to access all rows. Whereas after the comma, we put in “2” to say we want to access the third column (remember: we start counting at “0”, so “2” means the third element). The output shows all values for the “length” column and the column’s data type.
Setting values with the iloc() function
Setting new values within our dataframe using the iloc()
function works essentially the same as with the loc()
function. We state the rows or columns we want to access and equal that to a new value:
df.iloc[0] = 3
Now, the DataFrame df
looks like so:
speed | height | length | |
dog | 3 | 3.0 | 3.0 |
cat | 11 | 4.0 | 0.4 |
fish | 11 | 4.0 | 0.2 |
In this instance, we access the first row and set that row’s values equal to “3”. The output shows a dataframe with the first row containing only values that equal “3” and whose data types are adapted to each column’s respective data type.
Let’s have a look at one additional example. We set a new value for a specific cell:
df.iloc[1,2] = 7
speed | height | length | |
dog | 3 | 3.0 | 3.0 |
cat | 11 | 4.0 | 7.0 |
fish | 11 | 4.0 | 0.2 |
First, we access the element which is in the second row (determined by the “1” before the comma inside the iloc()
function) and in the third column (determined by the “2” after the comma). The accessed value is the cat’s length. Then we set that value to “7” and output the new dataframe. As we can see, the value for the cat’s length indeed changed to 7.
The differences between loc() and iloc()
To see how the loc()
function and the iloc()
function differ from each other, let’s have a look at another code example:
data = ["dog", "rabbit", "whale", "shark", "bird", "cat"] series = pd.Series(data, index=[4,5,6,1,2,3]) series
4 | dog |
5 | rabbit |
6 | whale |
1 | shark |
2 | bird |
3 | cat |
dtype: object |
First, we have a dataset as a list containing several animals. Then, we create a Pandas series with indexes in the order: 4,5,6,1,2,3. Finally, we output the series.
Now, let’s see what the loc()
function accesses when we do this:
series.loc[1] # Output: 'shark'
The loc()
function accesses the “shark
” because the shark’s index is labeled with “1” as we can see in the series above. At next, let’s have a look at the iloc()
variant:
series.iloc[1] # Output: 'rabbit'
The iloc()
function outputs “rabbit
” when we put in “1”. That’s because the function accesses the location, not the label. The location of “1” is the second row, so the value “rabbit
” is accessed, whereas the label of “1” is in the fourth row which contains the value “shark
”. So if we are looking for a specific index label, the loc()
function is the way to go. But with using the iloc()
function, we can always access a specific row or column by integer position.
Summary
All in all, the two functions loc()
and iloc()
are very powerful tools to access and change our data. There are versatile ways to use these methods. We can access specific rows, columns, individual cells and even apply conditionals to access and change the data. Which function to use depends on whether we want to access our data by label (loc()
function) or by location (iloc()
function). If you want to read more about these functions, I recommend you read the official loc()
function documentation or the official iloc()
function documentation. For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter Blog page.
Happy Coding!