This tutorial shows you how to group the inner lists of a Python list of lists by common element. There are three basic methods:
- Group the inner lists together by common element.
- Group the inner lists together by common element AND aggregating them (e.g. averaging).
- Group the inner lists together by common element AND aggregating them (e.g. averaging) using the Pandas external library.
Before we explore these three options in more detail, let’s give you the quick solution first using the Pandas library in our interactive shell:
You can run this code in your browser. If you want to learn about the Pythonic alternatives or you need a few more explanations, then read on!
Method 1: Group List of Lists By Common Element in Dictionary
Problem: Given a list of lists. Group the elements by common element and store the result in a dictionary (key = common element).
Example: Say, you’ve got a database with multiple rows (the list of lists) where each row consists of three attributes: Name, Age, and Income. You want to group by Name and store the result in a dictionary. The dictionary keys are given by the Name attribute. The dictionary values are a list of rows that have this exact Name attribute.
Solution: Here’s the data and how you can group by a common attribute (e.g., Name).
# Database: # row = [Name, Age, Income] rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] # Create a dictionary grouped by Name d = {} for row in rows: # Add name to dict if not exists if row[0] not in d: d[row[0]] = [] # Add all non-Name attributes as a new list d[row[0]].append(row[1:]) print(d) # {'Alice': [[19, 45000], [33, 118000]], # 'Bob': [[18, 22000]], # 'Ann': [[26, 88000]]}
You can see that the result is a dictionary with one key per name ('Alice'
, 'Bob'
, and 'Ann'
). Alice appears in two rows of the original database (list of lists). Thus, you associate two rows to her name—maintaining only the Age and Income attributes per row.
The strategy how you accomplish this is simple:
- Create the empty dictionary.
- Go over each row in the list of lists. The first value of the row list is the Name attribute.
- Add the Name attribute
row[0]
to the dictionary if it doesn’t exist, yet—initializing the dictionary to the empty list. Now, you can be sure that the key exist in the dictionary. - Append the sublist slice
[Age, Income]
to the dictionary value so that this becomes a list of lists as well—one list per database row. - You’ve now grouped all database entries by a common attribute (=Name).
So far, so good. But what if you want to perform some aggregation on the grouped database rows?
Method 2: Group List of Lists By Common Element and Aggregate Grouped Elements
Problem: In the previous example, you’ve seen that each dictionary value is a list of lists because you store each row as a separate list. But what if you want to aggregate all grouped rows?
Example: The dictionary entry for the key 'Alice'
may be [[19, 45000], [33, 118000]]
but you want to average the age and income values: [(19+33)/2, (45000+118000)/2]
. How do you do that?
Solution: The solution is simply to add one post-processing step after the above code to aggregate all attributes using the zip()
function as follows. Note that this is the exact same code as before (without aggregation) with three lines added at the end to aggregate the list of lists for each grouped Name into a single average value.
# Database: # row = [Name, Age, Income] rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] # Create a dictionary grouped by Name d = {} for row in rows: # Add name to dict if not exists if row[0] not in d: d[row[0]] = [] # Add all non-Name attributes as a new list d[row[0]].append(row[1:]) print(d) # {'Alice': [[19, 45000], [33, 118000]], # 'Bob': [[18, 22000]], # 'Ann': [[26, 88000]]} # AGGREGATION FUNCTION: for key in d: d[key] = [sum(x) / len(x) for x in zip(*d[key])] print(d) # {'Alice': [26.0, 81500.0], 'Bob': [18.0, 22000.0], 'Ann': [26.0, 88000.0]}
In the code, you use the aggregation function sum(x) / len(x)
to calculate the average value for each attribute of the grouped rows. But you can replace this part with your own aggregation function such as average, variance, length, minimum, maximum, etc.
Explanation:
- You go over each key in the dictionary (the Name attribute) and aggregate the list of lists into a flat list of averaged attributes.
- You zip the attributes together. For example,
zip(*d['Alice'])
becomes[[19, 33], [45000, 118000]]
(conceptually). - You iterate over each list
x
of this list of lists in the list comprehension statement. - You aggregate the grouped attributes using your own custom function (e.g.
sum(x) / len(x)
to average the attribute values).
See what happens in this code snippet in this interactive memory visualization tool (by clicking “Next”):
Method 3: Pandas GroupBy
The Pandas library has its own powerful implementation of the groupby() function. Have a look at the code first:
# Database: # row = [Name, Age, Income] rows = [['Alice', 19, 45000], ['Bob', 18, 22000], ['Ann', 26, 88000], ['Alice', 33, 118000]] import pandas as pd df = pd.DataFrame(rows) print(df) ''' 0 1 2 0 Alice 19 45000 1 Bob 18 22000 2 Ann 26 88000 3 Alice 33 118000 ''' print(df.groupby([0]).mean()) ''' 1 2 0 Alice 26 81500 Ann 26 88000 Bob 18 22000 '''
Explanation:
- Import the pandas library. Find your quick refresher cheat sheets here.
- Create a DataFrame object from the rows—think of it as an Excel spreadsheet in your code (with numbered rows and columns).
- Call the
groupby()
function on your DataFrame. Use the column index[0]
(which is the Name attribute) to group your data. This creates aDataFrameGroupBy
object. - On the
DataFrameGroupBy
object call themean()
function or any other aggregator function you want. - The result is the “spreadsheet” with grouped Name attributes where multiple rows with the same Name attributes are averaged (element-wise).
Where to Go From Here?
Enough theory. Let’s get some practice!
Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.
To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?
You build high-value coding skills by working on practical coding projects!
Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?
🚀 If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.
If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.