Problem Formulation
Given three arrays:
- The first two arrays
xandyof lengthncontain the(x_i, y_i)data of a 2D coordinate system. - The third array
cprovides categorical label information so we essentially getndata bundles(x_i, y_i, c_i)for an arbitrary number of categoriesc_i.
π¬ Question: How to plot the data so that (x_i, y_i) and (x_j, y_j) with the same category c_i == c_j have the same color?
Solution: Use Pandas groupby() and Call plt.plot() Separately for Each Group
To plot data by category, you iterate over all groups separately by using the data.groupby() operation. For each group, you execute the plt.plot() operation to plot only the data in the group.
In particular, you perform the following steps:
- Use the
data.groupby("Category")function assuming that data is a Pandas DataFrame containing thex,y, andcategorycolumns for n data points (rows). - Iterate over all
(name, group)tuples in the grouping operation result obtained from step one. - Use
plt.plot(group["X"], group["Y"], marker="o", linestyle="", label=name)to plot each group separately using thex,ydata andnameas a label.
Here’s what that looks like in code:
import pandas as pd
import matplotlib.pyplot as plt
# Generate the categorical data
x = [1, 2, 3, 4, 5, 6]
y = [42, 41, 40, 39, 38, 37]
c = ['a', 'b', 'a', 'b', 'b', 'a']
data = pd.DataFrame({"X": x, "Y": y, "Category": c})
print(data)
# Plot data by category
groups = data.groupby("Category")
for name, group in groups:
plt.plot(group["X"], group["Y"], marker="o", linestyle="", label=name)
plt.legend()
plt.show()Before I show you how the resulting plot looks, allow me to show you the data output from the print() function. Here’s the output of the categorical data:
X Y Category 0 1 42 a 1 2 41 b 2 3 40 a 3 4 39 b 4 5 38 b 5 6 37 a
Now, how does the colored category plot look like? Here’s how:

If you want to learn more about Matplotlib, feel free to check out our full blog tutorial series: