5 Best Ways to Detect Eyes in an Image Using OpenCV Python

πŸ’‘ Problem Formulation: Detecting eyes in images is a common task in computer vision, useful in various applications like facial recognition, eye-tracking, and human-computer interaction. The input is a digital image, and the desired output is the coordinates or bounding boxes around the detected eyes.

Method 1: Haar Cascade Classifier

This method uses the Haar Cascade algorithm, which is effective for object detection. OpenCV provides pre-trained Haar Cascade models that are suitable for real-time detection. Specifically, haarcascade_eye.xml is optimized for eye detection in images.

Here’s an example:

import cv2

# Load the image and convert it to grayscale
image = cv2.imread('face.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Load the pre-trained Haar Cascade for eye detection
eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml')

# Perform eye detection
eyes = eye_cascade.detectMultiScale(gray, 1.1, 4)

# Draw rectangles around detected eyes
for (ex, ey, ew, eh) in eyes:
    cv2.rectangle(image, (ex, ey), (ex+ew, ey+eh), (0, 255, 0), 2)

cv2.imshow('Detected Eyes', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The output will be the input image with green rectangles drawn around detected eyes.

The code snippet performs eye detection using the Haar Cascade algorithm. It first reads the input image and converts it to grayscale, then loads the pre-trained Haar Cascade eye detector. The detectMultiScale function locates the eyes, and rectangles are drawn on the original image to mark them.

Method 2: Deep Learning with DNN Module

Deep learning models can provide more accurate detection compared to Haar Cascades. OpenCV’s DNN module can run pre-trained deep learning models for eye detection using frameworks like TensorFlow or Caffe.

Here’s an example:

import cv2
import numpy as np

# Load the image and prepare as input
image = cv2.imread('face.jpg')
h, w = image.shape[:2]
blob = cv2.dnn.blobFromImage(image, 1.0, (w, h), (104.0, 177.0, 123.0))

# Load a pre-trained deep learning model for face detection
net = cv2.dnn.readNet('deploy.prototxt', 'res10_300x300_ssd_iter_140000.caffemodel')

# Perform detection
net.setInput(blob)
detections = net.forward()

# Post-process to find eyes within the face region
# Assume eyes network available as 'eye_net'
for i in range(detections.shape[2]):
    confidence = detections[0, 0, i, 2]
    if confidence > 0.5:
        # Process detections
        # ...

# Display output image with detected eyes marked (Omitted for simplicity)

cv2.imshow('Detected Eyes', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The output image will show eyes detected within the faces using deep learning methods.

This script loads a deep learning model and processes the input image into a blob format suitable for the model. It performs face detection and, with further processing (omitted for brevity), locates the eyes. The detected eyes are then marked on the output image. Note that an actual eye detection network code is required to complete this method, which depends on the availability of such models.

Method 3: Eye Aspect Ratio (EAR)

The Eye Aspect Ratio (EAR) is a simple geometric method used to detect blinks in image sequences and can be adapted to detect eyes by thresholding the EAR value. It relies on landmark detection to compute the ratio of distances between vertical eye landmarks and horizontal eye landmarks.

Here’s an example:

import cv2
import dlib
from scipy.spatial import distance

# Initialize dlib's face detector and landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

# Function to calculate EAR
def eye_aspect_ratio(eye):
    # Compute the Euclidean distances between the vertical eye landmarks
    A = distance.euclidean(eye[1], eye[5])
    B = distance.euclidean(eye[2], eye[4])

    # Compute the Euclidean distance between the horizontal eye landmarks
    C = distance.euclidean(eye[0], eye[3])

    # Compute the EAR
    ear = (A + B) / (2.0 * C)
    return ear

# Detect faces in the image
image = cv2.imread('face.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = detector(gray)

# Detect eyes based on EAR
for face in faces:
    landmarks = predictor(gray, face)
    leftEye = [landmarks.part(n) for n in range(36, 42)]
    rightEye = [landmarks.part(n) for n in range(42, 48)]
    leftEAR = eye_aspect_ratio(leftEye)
    rightEAR = eye_aspect_ratio(rightEye)

    # Use an appropriate threshold value
    if leftEAR < 0.2 and rightEAR < 0.2:
        # Eyes found (Omitted for simplicity)
        pass

# Display output image with detected eyes marked (Omitted for simplicity)

cv2.imshow('Detected Eyes', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The output will highlight the eyes if the EAR value falls below a set threshold, indicating that eyes are present.

In this method, dlib’s facial landmark detector is used to locate points around the eyes, and the EAR is computed to determine if the eyes are open. If the EAR is below a certain threshold, it indicates that the eyes are present and potentially closed (or blinking). This is useful in videos or sequences where blinking occurs.

Method 4: Template Matching

Template matching is a method in image processing for finding small parts of an image that match a template image. It can be particularly useful for eye detection when the eyes have a distinct appearance and the image conditions are controlled.

Here’s an example:

import cv2
import numpy as np

# Load image and template
image = cv2.imread('face.jpg')
template = cv2.imread('eye_template.jpg')
h, w = template.shape[:2]

# Convert images to grayscale
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)

# Perform template matching
result = cv2.matchTemplate(image_gray, template_gray, cv2.TM_CCOEFF_NORMED)

# Set a threshold and find where the match exceeds the threshold
threshold = 0.7
locations = np.where(result >= threshold)

# Draw rectangles around matched regions
for pt in zip(*locations[::-1]):
    cv2.rectangle(image, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)

cv2.imshow('Detected Eyes', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The output will be the input image with red rectangles drawn around areas that match the eye template.

Template matching is performed by sliding the eye template image across the image to find the match. The code snippet uses the matchTemplate function to detect regions in the image similar to the template. Matches exceeding the set threshold are considered detections and are subsequently marked on the output image.

Bonus One-Liner Method 5: Using MediaPipe

MediaPipe offers cross-platform, customizable ML solutions for live and streaming media. For eye detection, MediaPipe’s Face Mesh solution, which includes 468 3D facial landmarks, can accurately detect eyes with just a few lines of code.

Here’s an example:

import cv2
import mediapipe as mp

# Initialize MediaPipe Face Mesh
mp_face_mesh = mp.solutions.face_mesh
face_mesh = mp_face_mesh.FaceMesh()

# Process the image
image = cv2.imread('face.jpg')
results = face_mesh.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

# Assuming faces are detected, extract the landmarks for the eyes
# and draw them (Omitted for simplicity)

cv2.imshow('Detected Eyes', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

The output shows the facial landmarks detected by MediaPipe, including the eyes.

With MediaPipe’s Face Mesh model, eye detection is as straightforward as processing the image and extracting the relevant landmarks. While the code provided does not directly draw the landmarks, that would be the next step after obtaining them.

Summary/Discussion

  • Method 1: Haar Cascade Classifier. Easy to use and suitable for real-time applications. Pre-trained models are available but may not be as accurate as deep learning methods, especially in challenging lighting or faces at angles.
  • Method 2: Deep Learning with DNN Module. Potentially more accurate than Haar Cascades but require significant computational resources. The model’s size and architecture play a big role in performance and speed.
  • Method 3: Eye Aspect Ratio (EAR). A geometric approach that’s relatively simple and effective for detecting blinks. It requires reliably detecting facial landmarks, and performance may degrade with partial occlusion or profile faces.
  • Method 4: Template Matching. Works well under controlled conditions with limited variations. It might not perform well with different scales, rotations, or lighting conditions in the image.
  • Method 5: Using MediaPipe. Provides a robust and sophisticated approach with minimal code. However, extracting and processing facial landmarks might be compute-intensive, and it requires the MediaPipe library installed.