5 Best Ways to Plot a Time Series Array with Confidence Intervals in Python Matplotlib

πŸ’‘ Problem Formulation: In data analysis, representing uncertainty in graphical format is crucial, especially in time series where predictions and actual measurements may vary. This article solves the problem of visualizing time series data alongside its confidence intervals using Python’s Matplotlib libraryβ€”an essential for data scientists who wish to represent prediction robustness visually. For a given set of time series data points and their respective confidence intervals, we aim to plot a graph that clearly depicts the trend over time, flanked by the upper and lower confidence bounds.

Method 1: Basic Plot with Fill Between

This method uses the plot() function for drawing the time series and fill_between() to shade the area representing the confidence interval. The fill_between() method of Matplotlib creates a filled area between two horizontal curves, which is perfect for confidence intervals.

Here’s an example:

import matplotlib.pyplot as plt
import numpy as np

# Sample data
time = np.arange(0, 10, 0.1)
actual_data = np.sin(time)
std_dev = 0.1
lower_bound = actual_data - std_dev
upper_bound = actual_data + std_dev

# Plotting
plt.plot(time, actual_data, label='Actual Data')
plt.fill_between(time, lower_bound, upper_bound, color='gray', alpha=0.2, label='Confidence Interval')
plt.legend()
plt.show()

The output is a plot with the actual time series data in a solid line surrounded by a shaded area representing the confidence interval.

This code snippet first generates a simple sine wave as our time series data. It then calculates lower and upper bounds for the confidence interval with standard deviation. These are then plotted: the plot() function for the time series and fill_between() for the shaded confidence interval, with alpha controlling the transparency.

Method 2: Errorbar Plot with Confidence Interval

The errorbar() function in Matplotlib is typically used to represent the deviation of data points. When plotting time series data, it can also be repurposed to show confidence intervals.

Here’s an example:

import matplotlib.pyplot as plt
import numpy as np

# Sample data
time = np.arange(0, 10, 0.5)
actual_data = np.sin(time)
error = np.linspace(0.05, 0.2, len(time))

# Plotting with error bars
plt.errorbar(time, actual_data, yerr=error, label='Data with Confidence Interval', fmt='-o')
plt.legend()
plt.show()

The output is a plot with error bars extending above and below the data points, showing the range of the confidence interval at each time step.

In this example, we create a sine wave and simulate increasing error over time. The errorbar() function is then used to plot this data, with yerr providing the symmetric error range for the confidence interval and fmt='-o' defining the format of the data line and markers.

Method 3: Stacked Line Plot for Confidence Interval

In this method, we use two plot() calls to draw the upper and lower bounds of the confidence interval as lines, giving a clear indication of its boundaries in relation to the time series data.

Here’s an example:

import matplotlib.pyplot as plt
import numpy as np

# Sample data
time = np.linspace(0, 10, 100)
actual_data = np.sin(time)
conf_interval = 0.2 * np.cos(0.5 * time)

# Plotting the time series and confidence intervals
plt.plot(time, actual_data, label='Actual Data')
plt.plot(time, actual_data + conf_interval, linestyle='--', color='red', label='Upper Confidence Bound')
plt.plot(time, actual_data - conf_interval, linestyle='--', color='red', label='Lower Confidence Bound')
plt.legend()
plt.show()

The output is a plot with a solid line for the actual data and dashed lines representing the upper and lower bounds of the confidence interval.

We generate data arrays for the actual values and the bounds of the confidence intervals, which are then plotted using the plot() function with different linestyle and color arguments to distinguish the time series from the bounds.

Method 4: Bar Chart with Confidence Interval

For discrete time series data, a bar chart can be used to represent values, with error bars indicating the confidence interval. Matplotlib’s bar() function is combined with errorbar() to achieve this.

Here’s an example:

import matplotlib.pyplot as plt
import numpy as np

# Sample data
categories = ['Jan', 'Feb', 'Mar', 'Apr']
values = [10, 15, 13, 17]
errors = [1, 0.5, 1.5, 1]

x_pos = np.arange(len(categories))

# Plotting a bar chart with confidence intervals
plt.bar(x_pos, values, yerr=errors, align='center', alpha=0.5, ecolor='black', capsize=10)
plt.xticks(x_pos, categories)
plt.show()

The output is a bar chart with vertical lines (caps) on each bar indicating the range of the confidence interval.

This snippet uses categorical data to represent the mean values of some metric per month. The bar() function plots this data, and the yerr parameter adds vertical error bars with the capsize property specifying the width of the horizontal cap at the end of each error bar.

Bonus One-Liner Method 5: Plotting with Seaborn

While not part of Matplotlib, Seaborn is a statistical plotting library built atop Matplotlib and offers a higher-level interface for drawing attractive statistical graphics, including confidence intervals with a one-liner function.

Here’s an example:

import seaborn as sns
import numpy as np

# Sample data
time = np.arange(0, 10, 0.1)
actual_data = np.sin(time) + np.random.normal(size=len(time), scale=0.1)

# Plotting with Seaborn's lineplot, confidence interval is included by default
sns.lineplot(x=time, y=actual_data)
plt.show()

The output is a smooth line chart with a shaded area depicting the confidence intervals, automated by Seaborn’s internal calculations.

Using Seaborn’s lineplot() function, the data is plotted with automatic calculation and plotting of the confidence interval. This method is incredibly concise and useful for quick, attractive visualizations with minimal coding.

Summary/Discussion

  • Method 1: Basic Plot with Fill Between. Easy to implement. Provides a clear and direct visual of confidence intervals as a shaded area. However, it can be less precise for large datasets due to overlap.
  • Method 2: Errorbar Plot. Effective for emphasizing individual data points. Suitable for sparse datasets. The confidence interval visualization may become cluttered with crowded data points.
  • Method 3: Stacked Line Plot. Offers a clear boundary visualization for confidence intervals. Best suited when it’s important to outline the exact limits of intervals. May be visually overwhelming if too many intervals are plotted.
  • Method 4: Bar Chart with Confidence Interval. Ideal for discrete categorical data. The error bars provide a simple perception of variance. It’s not suitable for continuous time series data.
  • Method 5: Plotting with Seaborn. Provides a one-liner solution with an attractive output. However, it automatically calculates confidence intervals, offering less control to the user.