Method 1: Using the pivot_table()
Function
The pivot_table()
function in Pandas is a versatile method to create pivot tables from a DataFrame. It allows you to specify the data, the index/columns to group by, and the aggregation function to apply, such as sum
, mean
, or count
.
Here’s an example:
import pandas as pd # Sample data data = { 'Date': ['2021-01-01', '2021-01-02', '2021-01-01', '2021-01-02'], 'Region': ['North', 'West', 'East', 'South'], 'Product': ['A', 'B', 'C', 'D'], 'Sales': [100, 150, 200, 250] } df = pd.DataFrame(data) # Creating pivot table pivot_table = df.pivot_table(values='Sales', index='Region', columns='Date', aggfunc='sum')
The output of this code snippet:
Date 2021-01-01 2021-01-02 Region East 200 NaN North 100 NaN South NaN 250 West NaN 150
In this example, we’ve aggregated our sample sales data by regions and dates using the sum of sales as the values. This format quickly shows how each region performed on each date, filling missing combinations with NaN.
Method 2: Aggregating with Multiple Functions
The pivot_table()
function also supports multiple aggregation functions simultaneously, which can be useful for obtaining different summaries in one go.
Here’s an example:
# Using the same dataframe 'df' from Method 1 # Creating pivot table with multiple aggregation functions pivot_table = df.pivot_table(values='Sales', index='Region', aggfunc=['sum', 'mean'])
The output of this code snippet:
sum mean Region East 200 200.0 North 100 100.0 South 250 250.0 West 150 150.0
In this code, we are using both sum
and mean
functions to aggregate sales data by region. The output gives us both the total and average sales per region, providing a comparative perspective in one view.
Method 3: Adding Margins for Subtotals
Pandas pivot tables can include marginal subtotals by setting the margins
parameter to True
. This gives the sum of rows and columns, adding a grand total for quick insights.
Here’s an example:
# Using the same dataframe 'df' from Method 1 # Creating pivot table with margins pivot_table = df.pivot_table(values='Sales', index='Region', columns='Date', aggfunc='sum', margins=True)
The output of this code snippet:
Date 2021-01-01 2021-01-02 All Region East 200 NaN 200 North 100 NaN 100 South NaN 250 250 West NaN 150 150 All 300 400 700
By adding margins, the pivot table now includes the total sales for each region (column “All”), as well as for each date (row “All”). It also shows the grand total sales in the bottom-right corner, combining all data points.
Method 4: Filtering Data with query()
Before Pivot
Before creating a pivot table, you might want to filter your data. You can achieve this by using the query()
method, which selects data based on a query expression.
Here’s an example:
# Filtering data for one specific region, 'North' filtered_data = df.query("Region == 'North'") # Creating pivot table after filtering pivot_table = filtered_data.pivot_table(values='Sales', index='Product', aggfunc='sum')
The output of this code snippet:
Product A 100
In this snippet, we first used query()
to filter the DataFrame for sales in the ‘North’ region. Then we created a pivot table for this filtered dataset to obtain the sum of sales per product in the North region.
Bonus One-Liner Method 5: Quick Pivot with groupby()
and unstack()
While not a pivot table in the strictest sense, using groupby()
together with unstack()
achieves a similar result with succinct syntax.
Here’s an example:
# Quick pivot with groupby and unstack grouped_pivot = df.groupby(['Region', 'Date'])['Sales'].sum().unstack()
The output of this code snippet:
Date 2021-01-01 2021-01-02 Region East 200 NaN North 100 NaN South NaN 250 West NaN 150
This one-liner first groups data by ‘Region’ and ‘Date’ and then calculates the sum of ‘Sales’. The unstack()
method pivots the innermost index levels to columns, producing a layout similar to a pivot table.
Summary/Discussion
- Method 1:
pivot_table()
Function. Versatile and robust. Can require additional parameters to tweak. - Method 2: Multiple Aggregations. Provides a comprehensive view. May become complex with too many aggregations.
- Method 3: Margins. Useful for subtotal insights. Takes up extra space if unnecessary.
- Method 4: Filter with
query()
Before Pivot. Enables pre-pivot data curation. Extra step of filtering. - Method 5:
groupby()
+unstack()
. Quick and succinct. Not as flexible for complex pivoting tasks.