# 5 Best Ways to Create Scatter Plots and Color Mapping in Python

Rate this post

π‘ Problem Formulation: Scatter plots are crucial for visualizing the relationship between two numerical variables in data analysis. A common need is to color-map these points to represent an additional dimension, such as a category or a range of values. This article focuses on providing python-based solutions for generating scatter plots with color mapping, taking inputs as x and y coordinate arrays, and an additional array or parameter for color-coding each point, with the output being a colorful visualization of the data points.

## Method 1: Using Matplotlib

The matplotlib library is a fundamental plotting library in Python. Using `matplotlib.pyplot.scatter()`, large datasets can be easily visualized as scatter plots where each point’s color can be controlled via the `c` parameter, which accepts an array of colors or variables to be mapped to colors through a colormap.

Here’s an example:

```import matplotlib.pyplot as plt

x = [5, 7, 8, 5, 6, 7, 9]
y = [7, 4, 3, 5, 6, 1, 2]
sizes = [210, 410, 312, 214, 415, 312, 213]
colors = [0, 1, 2, 3, 4, 5, 6]

plt.scatter(x, y, s=sizes, c=colors, cmap='viridis')
plt.colorbar()
plt.show()```

The scatter plot displays points with coordinates given by `x` and `y` arrays, sizes according to the `sizes` array, and colored based on the `colors` array using the ‘viridis’ colormap. A color bar is also added to relate the colors to the values.

## Method 2: Using Seaborn

Seaborn simplifies the creation of visually appealing and informative statistical graphics in Python. It wraps Matplotlib functions and provides a high-level interface for drawing attractive scatter plots using the `sns.scatterplot()` function. Unique to Seaborn is its effortless integration with pandas DataFrames and automatic color mapping based on various categorical or numeric features.

Here’s an example:

```import seaborn as sns
import pandas as pd

df = pd.DataFrame({
'x': [5, 7, 8, 7, 2, 17, 2, 9],
'y': [99, 86, 87, 88, 100, 86, 103, 87],
'group': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'C']
})

sns.scatterplot(x='x', y='y', hue='group', data=df, palette='bright')
plt.show()```

This Seaborn scatter plot uses a pandas DataFrame as input. Colors are automatically assigned to different groups, as indicated by the ‘hue’ parameter. The ‘palette’ parameter can be customized to change the color theme.

## Method 3: Using Plotly

Plotly’s Python graphing library makes interactive, publication-quality graphs online. The `plotly.graph_objects.Scatter()` function allows highly interactive scatter plots, which can be color-mapped using color scales and are ideal for web-based datasets due to its dynamic nature, such as having zoomable plots and hover-over tips for data points.

Here’s an example:

```import plotly.graph_objects as go

fig = go.Figure(data=go.Scatter(
x=[1, 2, 3, 4],
y=[10, 11, 12, 13],
mode='markers',
marker=dict(size=[40, 60, 80, 100],
color=[0, 1, 2, 3],
showscale=True)
))

fig.show()```

The Plotly scatter plot visualizes points with interactive hover-over tips, dynamic scaling, and a color bar that represents the color values. It is exemplary for creating a user-friendly data exploration experience online.

## Method 4: Using Pandas Plot

Pandas is primarily used for data manipulation, but it also supports basic plotting capabilities. Using the `DataFrame.plot.scatter()` method, a scatter plot can be quickly produced directly from a DataFrame. When the `c` parameter is supplied with a column name, it automatically maps the colors of the points according to the values in that column, much like Seaborn.

Here’s an example:

```import pandas as pd

df = pd.DataFrame({
'x': range(1, 6),
'y': range(2, 11, 2),
'color': range(1, 101, 20)
})

df.plot.scatter(x='x', y='y', c='color', colormap='viridis')
plt.show()```

Using Pandas, this scatter plot takes in the ‘x’ and ‘y’ series directly from the DataFrame and uses the ‘color’ series for color-coding, applied through the ‘viridis’ colormap.

## Bonus One-Liner Method 5: Using Matplotlib Pyplot Inline

With the inline methodology in Matplotlib, one can quickly generate a colored scatter plot using a one-liner. This method condenses the process into a single line of code, which is beneficial for simple graphs when exploring data.

Here’s an example:

`plt.scatter('x', 'y', c='color', data=df, cmap='viridis')`

This one-liner command uses Matplotlib’s Pyplot interface to generate a scatter plot with minimal code, yet includes the power of color mapping through its ‘cmap’ parameter.

## Summary/Discussion

• Method 1: Matplotlib. Highly customizable. Suitable for technical scientific papers. Can be verbose for complex plots.
• Method 2: Seaborn. Provides aesthetic defaults. Great for statistical analysis. Less flexible than Matplotlib for highly customized graphs.
• Method 3: Plotly. Creates interactive plots perfect for the web. Can be overkill for static or simple exploratory data analysis.
• Method 4: Pandas Plot. Convenient for quick plotting within the pandas workflow. Not as powerful as Matplotlib or Seaborn for customization.
• Bonus One-Liner Method 5: Matplotlib Inline. Great for rapid, concise code. Not suitable for detailed, customized visuals.