5 Best Ways to Display a Scatter Plot in Python Using Seaborn

Rate this post

πŸ’‘ Problem Formulation: Data visualization is a pivotal step in data analysis and machine learning. For Python users, especially those dealing with statistical data, a concise and visually pleasing representation of data can provide significant insights. This article will address the problem of how one can utilize the Seaborn library to display a scatter plot. A scatter plot is desired for visualizing the relationship between two variables, for instance, ‘Age’ and ‘Income’ across a dataset.

Method 1: Basic Scatter Plot

Seaborn’s scatterplot() function presents the most straightforward method to create scatter plots. It requires at a minimum two arguments: the x and y variables that represent the data points’ coordinates on the plot. This function is ideal for quick inspections of the data distribution and potential correlations.

Here’s an example:

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
age = [25, 30, 35, 40, 45]
income = [40000, 50000, 60000, 80000, 75000]

# Create the scatter plot
sns.scatterplot(x=age, y=income)

# Show the plot
plt.show()

The output is a visual scatter plot with ‘age’ on the x-axis and ‘income’ on the y-axis.

This code snippet creates and displays a basic scatter plot. The x and y parameters are provided as lists representing age and income data points respectively. The sns.scatterplot() function then plots the points, and plt.show() displays the plot.

Method 2: Scatter Plot with Hue

Seaborn allows for the addition of a ‘hue’ parameter which adds a color-coded dimension to the data points. This can be used to represent a third variable, typically a categorical one, providing more context to the scatter plot and helping in distinguishing different categories graphically.

Here’s an example:

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data with an additional categorical variable
age = [25, 30, 35, 40, 45]
income = [40000, 50000, 60000, 80000, 75000]
gender = ['Female', 'Male', 'Female', 'Male', 'Female']

# Create a scatter plot with a categorical hue
sns.scatterplot(x=age, y=income, hue=gender)

# Show the plot
plt.show()

The output is a scatter plot with age and income as before, but now with points color-coded by gender.

Here, we’ve added a ‘gender’ list that serves as another dimension of our data. By specifying the hue parameter in the sns.scatterplot() function, Seaborn automatically assigns different colors to the distinct categories in the gender list, enhancing the visualization with minimal additional coding.

Method 3: Scatter Plot with Style

Beyond just color, Seaborn’s scatterplot() function also allows us to differentiate data points with different markers using the ‘style’ parameter. Each category can be represented by a distinct marker style, which can be very useful when the colors are not enough or to make the plot more accessible.

Here’s an example:

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
age = [25, 30, 35, 40, 45]
income = [40000, 55000, 60000, 80000, 75000]
gender = ['Female', 'Male', 'Female', 'Male', 'Female']

# Create a scatter plot with style distinction
sns.scatterplot(x=age, y=income, hue=gender, style=gender)

# Show the plot
plt.show()

The output will be a scatter plot where each gender not only has a different color but also a distinct marker style.

In this code snippet, the style argument is set to the same ‘gender’ list as the hue. This dual encoding means that Seaborn will give each gender a unique marker shape and color, increasing the plot’s readability, especially when printed in black and white or viewed by individuals with color vision deficiencies.

Method 4: Scatter Plot with Size

If there is another quantitative variable in the dataset, it’s possible to represent it on a scatter plot by adjusting the size of the data points with the ‘size’ parameter. This method is particularly useful when you want to highlight the magnitude of a variable in relation to the x and y dimensions.

Here’s an example:

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
age = [25, 30, 35, 40, 45]
income = [40000, 55000, 60000, 80000, 75000]
gender = ['Female', 'Male', 'Female', 'Male', 'Female']
savings = [5000, 8000, 20000, 30000, 15000]

# Create a scatter plot with varying sizes
sns.scatterplot(x=age, y=income, hue=gender, size=savings)

# Show the plot
plt.show()

Here, the output will show a scatter plot where the size of each point correlates with the ‘savings’ variable.

The new ‘savings’ list represents another attribute of our dataset. By using the size parameter, we let Seaborn automatically adjust the size of each point on the plot based on the ‘savings’ variable values. Large savings result in larger points, providing an additional layer of information.

Bonus One-Liner Method 5: Interactive Scatter Plot with Size and Style

For a more advanced and interactive take, Python’s Plotly library can be used in tandem with Seaborn to create dynamic scatter plots. This method provides interactivity, such as hover effects, zooming, and panning, enhancing the data exploration experience.

Here’s an example:

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

# Sample data
data = {'Age': [25, 30, 35, 40, 45],
        'Income': [40000, 55000, 60000, 80000, 75000],
        'Gender': ['Female', 'Male', 'Female', 'Male', 'Female'],
        'Savings': [5000, 8000, 20000, 30000, 15000]}

# Create an interactive scatter plot
fig = px.scatter(data, x="Age", y="Income",
                 size="Savings", color="Gender",
                 hover_name="Gender", size_max=60)

# Show the plot
fig.show()

This code snippet requires a dictionary ‘data’ containing our sample dataset and the use of Plotly Express’s scatter() function to create an interactive scatter plot. Features like hover information and dynamic scaling make Plotly a powerful option for data visualization.

Summary/Discussion

Method 1: Basic Scatter Plot. Strengths: Simple and quick to implement. Weaknesses: Limited in representing additional data dimensions.
Method 2: Scatter Plot with Hue. Strengths: Adds a color-coded dimension for categorical data. Weaknesses: Not ideal for color-blind readers without further customization.
Method 3: Scatter Plot with Style. Strengths: Introduces marker styles for better distinction. Weaknesses: Complexity increases with the addition of extra variables.
Method 4: Scatter Plot with Size. Strengths: Visualizes an additional quantitative variable. Weaknesses: Can become cluttered if not handled carefully.
Bonus Method 5: Interactive Scatter Plot. Strengths: Enhanced data exploration with interactivity. Weaknesses: Requires additional libraries and is more complex.