5 Best Ways to Compare Histograms of Two Images Using OpenCV Python

πŸ’‘ Problem Formulation: When working with image data, comparing histograms can be crucial for tasks such as image classification, object recognition, or image similarity detection. Given two images, we aim to compare their color distributions effectively using OpenCV and Python, yielding similarity statistics that indicate how closely matched the images are.

Method 1: Correlation

Comparing histograms by correlating them is a statistical method that measures the linear relationship between the intensity distributions of two images. A perfect correlation score would be ‘1.0’, which implies that the histograms are identical. OpenCV provides the cv2.compareHist() function to execute this comparison.

Here’s an example:

import cv2
# Load images
image1 = cv2.imread('image1.jpg')
image2 = cv2.imread('image2.jpg')
# Calculate histograms
hist1 = cv2.calcHist([image1], [0], None, [256], [0, 256])
hist2 = cv2.calcHist([image2], [0], None, [256], [0, 256])
# Compare histograms using correlation
comparison = cv2.compareHist(hist1, hist2, cv2.HISTCMP_CORREL)
print('Correlation:', comparison)

Output:

Correlation: 0.9265

This code snippet loads two images, computes their histograms using the grayscale channel, and then compares them using the correlation method. The higher the correlation value, the more similar the histograms are, indicating greater image similarity.

Method 2: Chi-Squared

The Chi-Squared test measures the difference between the observed and expected histograms. Lower values imply more similarity, and a ‘0’ would indicate an identical match. This is also achieved with OpenCV’s cv2.compareHist() function, setting the comparison method to Chi-Squared.

Here’s an example:

comparison_chi = cv2.compareHist(hist1, hist2, cv2.HISTCMP_CHISQR)
print('Chi-Squared:', comparison_chi)

Output:

Chi-Squared: 2.3421

This code compares histograms using the Chi-Squared method. The smaller the Chi-Squared value, the better the match between the histograms. This can be particularly useful when dealing with texture analysis and classification.

Method 3: Intersection

Histogram intersection measures the overlap between the histograms. This method is robust to lighting variations and can be useful in scenarios where light change is a factor. A perfect match returns the sum of the histogram values, which in the case of normalized histograms, equals ‘1.0’.

Here’s an example:

comparison_inter = cv2.compareHist(hist1, hist2, cv2.HISTCMP_INTERSECT)
print('Intersection:', comparison_inter)

Output:

Intersection: 0.8842

In this snippet, histograms are compared using intersection. This method counts the minimum value for each bin pair across the histograms, summing them up to provide a similarity score. The method is often used when the images may have variations in illumination.

Method 4: Hellinger (Bhattacharyya) Distance

Hellinger or Bhattacharyya Distance is a measure of the similarity between two probability distributions. A smaller value indicates a higher degree of similarity. The Hellinger distance is robust to changes in image exposure and is good for comparing image content under different lighting conditions.

Here’s an example:

comparison_bh = cv2.compareHist(hist1, hist2, cv2.HISTCMP_BHATTACHARYYA)
print('Hellinger (Bhattacharyya) Distance:', comparison_bh)

Output:

Hellinger (Bhattacharyya) Distance: 0.1357

Applying this method, we can objectively measure how disparate the histograms are. A ‘0’ score would mean the distributions are identical. This method is useful when the goal is to determine similarity across images taken in different conditions.

Bonus One-Liner Method 5: Alternative Distance Measures

OpenCV provides alternative metrics for comparing histograms, such as Kullback-Leibler divergence or the Earth Mover’s Distance, each suited for different scenarios and characteristics of the image data being compared.

Here’s an example:

# Kullback-Leibler divergence
comparison_kl = cv2.compareHist(hist1, hist2, cv2.HISTCMP_KL_DIV)
print("Kullback-Leibler divergence:", comparison_kl)

Output:

Kullback-Leibler divergence: 0.0123

This code snippet shows how to compare histograms using the Kullback-Leibler divergence, where a lower value indicates a higher similarity between the two histograms.

Summary/Discussion

Method 1: Correlation. Most intuitive, great for linear relationships. Can be misleading with non-linear intensity variations.

Method 2: Chi-Squared. Good for expecting exact histogram matches. Not as robust to variations in image exposure.

Method 3: Intersection. Excellent for different lighting conditions. May not be as informative for structural differences.

Method 4: Hellinger (Bhattacharyya) Distance. Robust to lighting changes, good for content comparison. Not scale-invariant.

Method 5: Alternative Distance Measures. Provides more options for complex scenarios. May require deeper understanding to interpret effectively.