5 Best Ways to Visualize Different Shapes of Data Points in Python Using Bokeh

πŸ’‘ Problem Formulation: When analyzing complex datasets, it’s crucial to distinguish between different groups or categories within the data. A common request from data scientists and analysts is the ability to visualize datasets in Python with varied shapes for each data point to represent different categories or conditions. For example, one might want to plot a scatter plot where each category of data points has a unique shape, making them easily distinguishable.

Method 1: Using Marker Function

Bokeh’s plotting library provides a versatile marker() function that allows users to specify shapes directly based on categorical variables. This method involves passing a list of markers, one for each category of data, and then mapping these to the data points when calling the plotting function.

Here’s an example:

from bokeh.plotting import figure, show, output_file

# output to static HTML file
output_file("scatter_shapes.html")

# create a new plot
p = figure(title = "Bokeh Marker Shapes Example")
marker_types = ['circle', 'square', 'triangle', 'diamond']

# some data
x_coords = [1, 2, 3, 4]
y_coords = [10, 20, 30, 40]
categories = ['A', 'B', 'C', 'D']

# add scatter renderer
for x, y, marker in zip(x_coords, y_coords, marker_types):
    p.scatter([x], [y], marker=marker, size=20)

show(p)

The output is a Bokeh plot called “scatter_shapes.html” that renders a scatter plot with four different shapes representing separate categories.

This code snippet demonstrates the ease with which the marker() function adds different shape markers to the data visualization. This makes it plain which data point belongs to which category, enhancing the interpretability of the scatter plot.

Method 2: Using ColumnDataSource

The ColumnDataSource object in Bokeh allows for the central management of data and can be used along with a glyph’s properties to render multiple shapes. Each shape’s glyph properties can be batched together into a single column data source, making it convenient to update and manipulate the data and associated shapes.

Here’s an example:

from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show, output_file

output_file("multiple_shapes.html")

source = ColumnDataSource(data={
    'x': [1, 2, 3, 4],
    'y': [50, 60, 70, 80],
    'marker': ['circle_x', 'diamond', 'inverted_triangle', 'asterisk']
})

p = figure(title = "ColumnDataSource Shapes Example")
p.scatter('x', 'y', source=source, marker='marker', size=20)

show(p)

The output is a Bokeh plot labeled “multiple_shapes.html” showcasing varying shapes for each data point through a ColumnDataSource.

This example illustrates a modular approach where data and marker types are stored in a ColumnDataSource. The plot pulls from this centralized source, providing a way to visualize differing shapes of data points with improved data handling, particularly for larger datasets.

Method 3: Categorical Color and Shape Mapping

Bokeh can easily handle both shape and color mapping for different categories within a dataset. This dual-encoding can make plots especially informative, allowing for differentiating data points by both color and shape simultaneously.

Here’s an example:

from bokeh.transform import factor_mark, factor_cmap
from bokeh.plotting import figure, show, output_file

output_file("color_shape_mapping.html")

factors = ['A', 'B', 'C', 'D']
x = [1, 2, 3, 4]
y = [100, 200, 300, 400]

marker_shapes = ['hex', 'cross', 'square', 'circle']
colors = ['#FF0000', '#00FF00', '#0000FF', '#FFFF00']

p = figure(title = "Categorical Color and Shape Mapping")
p.scatter(x, y, marker=factor_mark('factors', marker_shapes, factors),
          color=factor_cmap('factors', colors, factors))

show(p)

The output is a Bokeh plot titled “color_shape_mapping.html” featuring both color and shape encodings to distinguish between data points.

By using functions like factor_mark and factor_cmap, this snippet enables coding multiple aesthetic properties to the same categorical variable, which is helpful for rich and informative visualizations when dealing with complex datasets.

Method 4: Custom Glyph Method

Advanced Bokeh plotting may require the creation of custom glyphs for unique dataset visualization needs. Bokeh provides an extensive API to define and render custom shapes that can be associated with different data points.

Here’s an example:

from bokeh.plotting import figure, show, output_file

output_file("custom_glyphs.html")

p = figure(title="Custom Glyphs Example")

# Custom data points
p.circle_cross(x=[1], y=[100], size=25, color="navy", alpha=0.5)
p.diamond_cross(x=[2], y=[150], size=25, color="firebrick", alpha=0.5)

show(p)

The output is a Bokeh plot labeled “custom_glyphs.html” that renders a visualization using custom combination glyphs like circle_cross and diamond_cross.

This code snippet takes advantage of the glyphs like circle_cross and diamond_cross, which are included in Bokeh’s glyph methods. It’s a hands-on way to get creative with the visualization when standard marker shapes don’t meet the requirements of the data’s representation.

Bonus One-Liner Method 5: Inline Visualization

For a quick and simple inline representation of varying shapes within a notebook or a web application, you can use Bokeh’s scatter() function directly, with a list of shapes passed in a one-liner.

Here’s an example:

from bokeh.plotting import figure, show

# Inline simple scatter plot with different markers
show(figure(title="Inline Scatter Example").scatter([1, 2, 3], [4, 5, 6], marker=['star', 'circle', 'triangle'], size=20))

This outputs an inline Bokeh scatter plot with three points, each having a different shape: a star, circle, and triangle.

While this one-liner is less flexible than the other methods, it provides a quick way to generate a simple scatter plot with varied markers for users who want to immediately visualize their data without more comprehensive data management.

Summary/Discussion

  • Method 1: Using Marker Function. Straightforward and simple. Limited by the predefined markers in Bokeh’s library.
  • Method 2: Using ColumnDataSource. Highly scalable and manageable, especially for large datasets. Requires initial setup of data source.
  • Method 3: Categorical Color and Shape Mapping. Offers rich dual-encoding visuals. The method can become complex when dealing with many categories.
  • Method 4: Custom Glyph Method. Allows the most flexibility and creativity. May require more advanced knowledge of Bokeh’s API.
  • Method 5: Inline Visualization. Quick and easy; suitable for exploratory data analysis. Not ideal for precise or customizable visualizations.