5 Best Ways to Use Bokeh to Generate Scatter Plots in Python

πŸ’‘ Problem Formulation: In the realms of data visualization in Python, generating a scatter plot is a common requirement to explore relationships between two sets of data. Suppose you have two arrays of data points, one for the x-axis and another for the y-axis, and you want to visualize this data graphically to discern patterns or correlations. The desired output is an interactive scatter plot that can be easily interpreted and manipulated.

Method 1: Using figure()

Bokeh’s figure() function provides a simple and straightforward way to create scatter plots. It enables initializing a plot with various optional parameters to customize the presentation, such as tools, plot width, and title. The circle() glyph method is commonly used to render a scatter plot within the figure.

Here’s an example:

from bokeh.plotting import figure, show, output_file

# Sample data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# Output to static HTML file
output_file("scatter_plot.html")

# Create a new plot with a title and axis labels
p = figure(title="Simple scatter plot example", x_axis_label='X', y_axis_label='Y')

# Add a circle renderer with size, color, and alpha
p.circle(x, y, size=20, color="navy", alpha=0.5)

# Show the results
show(p)

The output is a scatter plot in a HTML file named “scatter_plot.html” with navy blue circles representing data points.

This code snippet creates a basic scatter plot using Bokeh’s figure library. The circle method Here, output_file determines the output file name and format. figure() initializes the plot, and circle() adds the scatter glyphs. The final show() function call displays the plot in the browser.

Method 2: Customizing the scatter markers

Bokeh provides numerous marker types to customize scatter plots. Functions like asterisk(), cross(), and square() can be used to alter the shape of the markers for better visual distinction or aesthetic preference.

Here’s an example:

from bokeh.plotting import figure, show, output_file

# Sample data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# Output to static HTML file
output_file("custom_scatter_plot.html")

# Create a new plot
p = figure(title="Custom scatter plot example")

# Add custom markers
p.asterisk(x, y, size=20, color="green")

# Show the results
show(p)

The output is a scatter plot with green asterisk markers.

This code snippet demonstrates how to create a customized scatter plot by using a different glyph method, in this case, asterisk. The process remains similar to Method 1, with the alteration of the marker type to provide visual variety.

Method 3: Adding interactivity with hover tools

Interactivity can be added to scatter plots in Bokeh using the HoverTool, which allows one to display additional data about each point (like its coordinates or any other metadata) when hovered over with the cursor.

Here’s an example:

from bokeh.models import HoverTool
from bokeh.plotting import figure, show, output_file

# Sample data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# Output to static HTML file
output_file("interactive_scatter_plot.html")

# Create a new plot
p = figure(title="Interactive scatter plot example")

# Add circle renderer with a size, color, and alpha
p.circle(x, y, size=20, color="red", alpha=0.5)

# Create a HoverTool object
hover = HoverTool()
hover.tooltips=[
    ("Index", "$index"),
    ("(x, y)", "($x, $y)"),
    ("Radius", "@size")
]

# Add the HoverTool to the figure
p.add_tools(hover)

# Show the results
show(p)

The output is an interactive scatter plot that displays index and coordinates when a point is hovered over.

The code above adds an interactive component to the scatter plot. The HoverTool is added to the plot, configured with a tooltip to show the index and coordinates of the data points when hovered over, enhancing the plot’s interactivity and usefulness.

Method 4: Linking plots

Bokeh allows the linkage of multiple plots, which is especially useful when visualizing multi-faceted datasets. Linking plots can enable coordinated zooming or panning, allowing for simultaneous exploration of multiple scatter plots.

Here’s an example:

from bokeh.plotting import figure, show, output_file, gridplot

# Sample data
x = [1, 2, 3, 4, 5]
y0 = [6, 7, 2, 4, 5]
y1 = [10, 5, 4, 2, 1]

# Output to static HTML file
output_file("linked_scatter_plots.html")

# Create two new plots
s1 = figure(width=250, height=250, title="Scatter plot 1")
s2 = figure(width=250, height=250, title="Scatter plot 2", x_range=s1.x_range, y_range=s1.y_range)

# Add circle renderer
s1.circle(x, y0, size=10, color="navy", alpha=0.5)
s2.circle(x, y1, size=10, color="firebrick", alpha=0.5)

# Put plots in a gridplot
p = gridplot([[s1, s2]])

# Show the results
show(p)

The output consists of two linked scatter plots displayed side by side.

The provided code demonstrates how two scatter plots can be linked using common x and y ranges. The function gridplot() is then used to arrange the plots in a grid, making it easier to observe relationships between the two datasets.

Bonus One-Liner Method 5: Quick Scatter Plot

For quick and dirty visualizations when deep customization is not necessary, Bokeh’s output_file() and show(), combined with figure()‘s circle(), can be condensed into a one-liner.

Here’s an example:

show(figure().circle([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=20, color="olive", alpha=0.5))

The output is a simple scatter plot displayed in the browser.

This code creates a quick scatter plot without setting up the output file, explicitly adding titles, or labels. It’s perfect for instances where you need to rapidly visualize the data without needing to save or further manipulate the plot.

Summary/Discussion

  • Method 1: Using figure(). A robust method that works well for most needs. Offers flexibility in customization but can be verbose for simple plots.
  • Method 2: Customizing the scatter markers. Provides aesthetic flexibility, enhancing the visualization’s ability to communicate. Limited by the available marker types.
  • Method 3: Adding interactivity with hover tools. Greatly enhances user engagement and data comprehension. Can become cluttered with too much information.
  • Method 4: Linking plots. Effective for multitiered data analysis. Requires careful management of plot ranges to ensure proper linkage.
  • Method 5: Quick Scatter Plot. Ideal for rapid visualization. Lacks features for detailed analysis and refinement.