5 Best Ways to Plot ROC Curve in Python

πŸ’‘ Problem Formulation: In machine learning classification tasks, evaluating model performance is critical. A Receiver Operating Characteristic (ROC) Curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. This article will demonstrate how to plot an ROC curve in Python using different methods, with input as model predictions and outputs as the ROC Curve plots.

Method 1: Using Matplotlib and sklearn.metrics

The Matplotlib library in tandem with sklearn.metrics allows for plotting ROC curves with flexibility in styling and annotations. The roc_curve and auc functions of sklearn.metrics are used to compute the points on the ROC curve and the Area Under the Curve (AUC) respectively. Customization of the plot is achieved through Matplotlib’s native plot commands.

Here’s an example:

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Generate a dataset and split it
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

# Train a Logistic Regression classifier
model = LogisticRegression()
model.fit(X_train, y_train)
y_scores = model.predict_proba(X_test)[:, 1]

# Compute ROC curve and ROC area
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)

# Plot ROC Curve
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0,1], [0,1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0,1.0])
plt.ylim([0.0,1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()

The output would be a ROC Curve plotting true positive rate against false positive rate, with the AUC score displayed.

This code snippet first creates a synthetic binary classification dataset, then splits it into training and testing sets. A logistic regression model is fitted to the training data. The predict_proba method is used to get the prediction scores required for the ROC curve. Using the roc_curve and auc functions from sklearn.metrics, the ROC curve points and AUC are calculated and then plotted using Matplotlib.

Method 2: Using Seaborn and sklearn.metrics

Seaborn is a visualization library that builds on top of Matplotlib, providing a high-level interface for drawing attractive statistical graphics. While it doesn’t have a built-in ROC plotting function, it can be used in conjunction with sklearn.metrics to plot ROC Curves with an aesthetically-pleasing style easily.

Here’s an example:

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_class

es=2, flip_y=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Build and train the classifier
classifier = RandomForestClassifier(n_estimators=100)
classifier.fit(X_train, y_train)
y_pred_proba = classifier.predict_proba(X_test)[::,1]

# Calculate ROC Curve and AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

# Customize the seaborn style
sns.set_style("whitegrid")

# Plot the ROC Curve
plt.figure(figsize=(10, 8))
sns.lineplot(fpr, tpr, label='ROC curve (area = %0.2f)' %roc_auc)
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve using Seaborn with an AUC of %0.2f' % roc_auc)
plt.legend(loc="lower right")
plt.show()

The output would be an ROC Curve enhanced with Seaborn’s styling, including the AUC score in the legend.

This code block demonstrates the usability of Seaborn for plotting ROC curves by generating a synthetic dataset, splitting it, and then training a RandomForestClassifier. The probabilities for the positive class are obtained, and the roc_curve and auc are calculated the same way as in the previous example. The curve is then plotted using Seaborn’s lineplot function with matplotlib’s figure settings and style adjustments from Seaborn for improved aesthetics.

Method 3: Using Plotly for Interactive Plots

Plotly is a graphing library that makes interactive, publication-quality graphs online. For an interactive visualization of the ROC curve, which can be beneficial for presentations or reports where reader engagement is necessary, Plotly can be the library of choice.

Here’s an example:

import plotly.graph_objs as go
from sklearn.metrics import roc_curve, auc
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

# Create a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=7)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=7)

# Train an SVM classifier
svm = SVC(probability=True)
svm.fit(X_train, y_train)
y_scores = svm.decision_function(X_test)

# Compute ROC curve and AUC
fpr, tpr, _ = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)

# Create the ROC Curve trace
trace = go.Scatter(x=fpr, y=tpr, mode='lines', name='AUC = %0.2f' % roc_auc,
                   line=dict(color='darkorange', width=2))
reference_line = go.Scatter(x=[0,1], y=[0,1], mode='lines', name='Reference Line',
                            line=dict(color='navy', width=2, dash='dash'))

# Construct the figure
fig = go.Figure(data=[trace, reference_line])
fig.update_layout(title='Interactive ROC Curve',
                  xaxis_title='False Positive Rate',
                  yaxis_title='True Positive Rate',
                  margin=dict(l=40, r=0, t=40, b=30))
fig.show()

The output is an interactive ROC Curve that readers can hover over to see details, enhancing user engagement and insight.

This example utilizes Plotly to achieve an interactive ROC curve. A support vector machine (SVC) with probability estimates is trained. The decision scores for the test set are then used with roc_curve and auc to find the coordinates for the plot. Plotly’s Scatter trace is used to draw the curve and a reference line, and the curve is displayed in an interactive plotting window.

Method 4: Utilizing Yellowbrick for Model Visualization

Yellowbrick extends the Scikit-learn API to facilitate model selection and tuning with visual representations. Yellowbrick’s ROCAUC class simplifies ROC Curve plotting and is helpful for models comparison as it allows plotting multiple ROC Curves on the same plot.

Here’s an example:

from yellowbrick.classifier import ROCAUC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

# Create a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate the classification model
model = LogisticRegression()
visualizer = ROCAUC(model, classes=["Not Purchased", "Purchased"])

# Fit the model, then score with the test data
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()

The output will be a ROC Curve with AUC displayed using Yellowbrick’s visualizer.

In this code snippet, a ROC Curve is easily plotted using the Yellowbrick library. After creating a synthetic dataset and training a Logistic Regression model, the ROCAUC visualizer is fitted with the training data. It is then scored with the testing data, generating a ROC Curve that aesthetically displays the model performance along with the AUC.

Bonus One-Liner Method 5: Quick Plot with pandas and matplotlib

For those seeking a minimalistic approach, pandas in combination with Matplotlib can be used to quickly plot an ROC curve with very little code.

Here’s an example:

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import make_classification

# Synthetic Data
X, y = make_classification(n_samples=1000, n_classes=2, random_state=123)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

# Classifier
mlp = MLPClassifier(max_iter=1000)
mlp.fit(X_train, y_train)
y_pred_prob = mlp.predict_proba(X_test)[:,1]

# Compute ROC and AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_prob)
roc_auc = auc(fpr, tpr)

# Plot
pd.DataFrame({'FPR': fpr, 'TPR': tpr}).set_index('FPR')['TPR'].plot()
plt.plot([0, 1], [0, 1], 'k--')
plt.title('ROC Curve with AUC={}'.format(roc_auc))
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.show()

The output is a simple ROC Curve plotted against false positive and true positive rates, displaying the AUC score.

The one-liner example uses a pandas DataFrame to create a single line of code for the ROC plot. After predictions are obtained, the ROC points are plotted against one another directly from the DataFrame. This method provides an incredibly succinct way to generate a quick look at the model performance.

Summary/Discussion

  • Method 1: Matplotlib and sklearn.metrics. Strength: Highly customizable plots. Weakness: Can be verbose for simple uses.
  • Method 2: Seaborn and sklearn.metrics. Strength: Offers an easy way to create attractive plots. Weakness: Limited by Seaborn’s level of customization.
  • Method 3: Plotly. Strength: Creates interactive plots for better interpretation. Weakness: Not static, which can sometimes be necessary for documentation or printouts.
  • Method 4: Yellowbrick. Strength: Integrates smoothly with Scikit-learn models for direct performance visualization. Weakness: Less familiar to those more accustomed to Scikit-learn or Matplotlib.
  • Bonus Method 5: Pandas and Matplotlib. Strength: Quick and simple one-liner plotting. Weakness: Limited customization and functionality compared to the other methods.