5 Best Ways to Calculate Euclidean Distance Using Scikit-learn in Python

πŸ’‘ Problem Formulation: Euclidean distance is a measure of the true straight line distance between two points in Euclidean space. In data science, it’s a common method to compute the distance between vectors, often representing data points. For instance, given two points P1(1,2) and P2(4,6), we want to find the Euclidean distance between them using Python’s Scikit-learn library.

Method 1: Using euclidean_distances function

This Scikit-learn function returns a distance matrix, providing the Euclidean distances between pairs in two arrays. The euclidean_distances function is a direct way to compute the distances and is perfect for when you have more than two vectors and need a pairwise distance matrix.

Here’s an example:

from sklearn.metrics.pairwise import euclidean_distances

# Define two 2D points
P1 = [[1, 2]]
P2 = [[4, 6]]

# Calculate Euclidean distance
dist_matrix = euclidean_distances(P1, P2)
print(dist_matrix)

Output:

[[5.65685425]]

The euclidean_distances function takes two arrays as input and returns a matrix of distances. Here, we defined points P1 and P2 as 2D arrays to comply with the input format, and the resulting matrix is a 1×1 matrix, essentially giving us the distance.

Method 2: Using pairwise_distances function with metric='euclidean'

The pairwise_distances function is similar to euclidean_distances but more flexible, allowing for other distance metrics as well. Specifying metric='euclidean' computes the Euclidean distance.

Here’s an example:

from sklearn.metrics import pairwise_distances

# Define two points
P1 = [[1, 2]]
P2 = [[4, 6]]

# Calculate Euclidean distance
dist = pairwise_distances(P1, P2, metric='euclidean')
print(dist)

Output:

[[5.65685425]]

We used pairwise_distances with the metric parameter set to ‘euclidean’ to calculate the distance between P1 and P2. This method is especially useful when later modifications are necessary for using different distance metrics without altering the code structure significantly.

Method 3: Utilizing the DistanceMetric class

The DistanceMetric class provides a more object-oriented approach to calculate distances. It offers various distance computation methods via its different class methods.

Here’s an example:

from sklearn.neighbors import DistanceMetric

# Define the metric as 'euclidean'
dist = DistanceMetric.get_metric('euclidean')

# Define two points
P1 = [[1, 2]]
P2 = [[4, 6]]

# Calculate Euclidean distance
distance = dist.pairwise(P1, P2)
print(distance)

Output:

[[5.65685425]]

By creating a DistanceMetric object with ‘euclidean’ as the parameter, we are able to use the pairwise method to compute the distance between P1 and P2. This method encapsulates the distance computation in an object, making it easy to reuse and manage.

Method 4: Compute with dist.pairwise method after setting the metric

This method is a continuation of the object-oriented concept. After initializing the DistanceMetric class with the Euclidean metric, the pairwise method computes the distance.

Here’s an example:

from sklearn.neighbors import DistanceMetric

# Define and set the metric
dist_metric = DistanceMetric.get_metric('euclidean')

# Define two points
P1 = [1, 2]
P2 = [4, 6]

# Calculate Euclidean distance
distance = dist_metric.pairwise([P1], [P2])
print(distance)

Output:

[[5.65685425]]

After setting the metric for the DistanceMetric class, we can repeatedly call the pairwise method. This provides a clear method to apply a consistent distance measurement across various data points, with the flexibility to change the metric as needed.

Bonus One-Liner Method 5: Using numpy.linalg.norm

Though not a direct Scikit-learn method, Python’s NumPy library offers a highly efficient way to calculate Euclidean distance via the norm function, resulting in a concise one-liner.

Here’s an example:

import numpy as np

# Define two points
P1 = np.array([1, 2])
P2 = np.array([4, 6])

# Calculate Euclidean distance in one-liner
distance = np.linalg.norm(P1 - P2)
print(distance)

Output:

5.656854249492381

Using NumPy’s np.linalg.norm function, we can compute the Euclidean distance by simply subtracting the two points and passing the result to the function. While it’s not part of Scikit-learn, it’s a commonly used method due to its simplicity and performance.

Summary/Discussion

  • Method 1: euclidean_distances function. Direct calculation of distances with pairwise distance matrix support. Less flexible for different distance metrics.
  • Method 2: pairwise_distances function. Calculates pairwise distances with flexibility to switch between different metrics. Slightly more complex for simple distance calculations.
  • Method 3: Using DistanceMetric class. Object-oriented approach providing reusability and maintainability. Might be overkill for straightforward one-off computations.
  • Method 4: Compute with dist.pairwise method. A clear and consistent way to compute Euclidean distance and easily adaptable to other metrics. Requires object instantiation.
  • Bonus Method 5: Using numpy.linalg.norm. A highly efficient, non-Scikit-learn approach. Best for simplicity and speed but lacks the pairwise matrix computation convenience.