π‘ Problem Formulation: In computer vision, detecting humans in images is a fundamental task, important for applications like surveillance, customer tracking, and advanced driver assistance systems. Given an image or video frame, the goal is to identify and localize all the human figures within. Using Python and OpenCV, this article demonstrates various methods to achieve human detection, with the expected output comprising coordinates bounding the detected human regions.
Method 1: Haar Cascades for Human Detection
This method involves the use of Haar Cascade classifiers, which are effective for object detection. Pre-trained Haar models for human detection can be loaded using OpenCV. It works by scanning the image at various scales and locations, looking for features that match the human form.
Here’s an example:
import cv2 # Load the pre-trained Haar Cascade model for human detection human_cascade = cv2.CascadeClassifier('haarcascade_fullbody.xml') # Read the image image = cv2.imread('image.jpg') # Convert to grayscale as Haar operates on grayscale images gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Perform the detection detections = human_cascade.detectMultiScale(gray, 1.1, 4) # Draw rectangles around detected humans for (x, y, w, h) in detections: cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2) # Save the result cv2.imwrite('detected.jpg', image)
The output of this code is an image with blue rectangles drawn around the detected human figures.
This snippet loads an image and a human detection Haar Cascade classifier provided by OpenCV. After converting the image to grayscale, it applies the detectMultiScale()
function to find humans, and it outlines the detected areas with rectangles.
Method 2: Histogram of Oriented Gradients (HOG) with SVM Classifier
The Histogram of Oriented Gradients (HOG) coupled with Support Vector Machine (SVM) classifier is a robust technique for object detection, including human detection. HOG descriptors effectively capture edge and gradient structure that characterize human shapes, while the SVM classifier separates these human-like shapes from the rest of the background.
Here’s an example:
import cv2 import imutils # Initialize HOG descriptor with pre-trained person detector hog = cv2.HOGDescriptor() hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) # Read the image and resize for better detection efficiency image = cv2.imread('image.jpg') image = imutils.resize(image, width=min(400, image.shape[1])) # Detect humans in the image (humans, _) = hog.detectMultiScale(image, winStride=(8, 8), padding=(16, 16), scale=1.05) # Draw rectangles around detected humans for (x, y, w, h) in humans: cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2) # Save the result cv2.imwrite('detected.jpg', image)
The output is an image with green rectangles around detected human figures.
In the code example, a pre-trained person detector is set up using HOG descriptors, and it is then applied to a resized image. Detected regions are marked with rectangles. The winStride
, padding
, and scale
parameters control the detection process.
Method 3: Deep Learning with OpenCV’s DNN Module
OpenCV’s Deep Neural Network (DNN) module allows loading pre-trained deep learning models for object detection, including human detection. Models such as Single Shot MultiBox Detector (SSD) or Faster R-CNN can be used, which have been trained on large datasets like COCO or ImageNet and can provide very accurate detections.
Here’s an example:
import cv2 # Load the model and the weights net = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'res10_300x300_ssd_iter_140000.caffemodel') # Read the image image = cv2.imread('image.jpg') (h, w) = image.shape[:2] # Create a blob and pass it through the network blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0)) net.setInput(blob) detections = net.forward() # Loop over the detections for i in range(0, detections.shape[2]): confidence = detections[0, 0, i, 2] # Filter out weak detections if confidence > 0.7: # Retrieve bounding box coordinates box = detections[0, 0, i, 3:7] * np.array([w, h, w, h]) (startX, startY, endX, endY) = box.astype("int") # Draw the bounding box cv2.rectangle(image, (startX, startY), (endX, endY), (0, 0, 255), 2) # Save the result cv2.imwrite('detected.jpg', image)
The output is an image where detected human figures are surrounded by red rectangles if their detection confidence exceeds a certain threshold (0.7 in this case).
This example utilizes a Caffe-based deep learning model for human detection. A blob from the image is created and passed through the network to obtain the detections, which are then filtered and visualized with red bounding boxes if they pass a confidence check.
Method 4: Using OpenCV’s Pre-Trained YOLO (You Only Look Once) Model
YOLO is a popular real-time object detection system that applies a single neural network to the full image, thus enabling it to predict bounding boxes and class probabilities directly from full images in one evaluation. OpenCV provides interfaces to work with YOLO models easily.
Here’s an example:
import cv2 import numpy as np # Load YOLO net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg') layer_names = net.getLayerNames() output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()] # Load image image = cv2.imread('image.jpg') height, width, channels = image.shape # Create blob and do forward pass blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False) net.setInput(blob) outs = net.forward(output_layers) # Information for each object detected for out in outs: for detection in out: scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] if confidence > 0.5 and class_id == 0: # Class ID 0 is human # Object detected center_x = int(detection[0] * width) center_y = int(detection[1] * height) w = int(detection[2] * width) h = int(detection[3] * height) # Rectangle coordinates x = int(center_x - w / 2) y = int(center_y - h / 2) cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 255), 2) # Save the result cv2.imwrite('detected.jpg', image)
The output is an image decorated with cyan rectangles around detected human figures, provided they have a confidence above 0.5 and are identified as a person class.
This snippet of code utilizes a trained YOLO model to detect human figures. It processes the image, identifies human class objects with sufficient confidence, and draws rectangles encoded in cyan for visualization.
Bonus One-Liner Method 5: Pre-Built Deep Learning Image Detector
A compact approach to human detection uses solutions that package deep learning models behind single-function APIs that require just a few lines of code to perform object detection tasks.
Here’s an example:
import cv2 import imageai detector = imageai.Detection.ObjectDetection() detector.setModelTypeAsRetinaNet() detector.setModelPath('resnet50_coco_best_v2.1.0.h5') detector.loadModel() detections = detector.detectObjectsFromImage(input_image='image.jpg', output_image_path='detected.jpg', minimum_percentage_probability=70) for detection in detections: if detection['name'] == 'person': print(detection['box_points'])
The output is a list of coordinates for the bounding boxes of detected humans in the image, along with an image saved with these detections annotated.
By utilizing ImageAI’s ObjectDetection, this code achieves human detection with a RetinaNet model, setting a minimum probability threshold for detections, and marking detected persons.
Summary/Discussion
Method 1: Haar Cascades. Strengths: Quick, good for static camera angles, simple to use. Weaknesses: Less accurate with variable lighting or overlapping figures.
Method 2: HOG with SVM. Strengths: Effective at capturing human form, fairly robust to variations. Weaknesses: Can be slower and less accurate than deep learning methods.
Method 3: DNN Module. Strengths: Highly accurate, uses state-of-the-art deep learning models. Weaknesses: May require more computational resources, complexity in implementation.
Method 4: YOLO. Strengths: Real-time detection, very accurate, can detect humans along with other objects. Weaknesses: Can be resource-intensive.
Method 5: Pre-Built Deep Learning Detector. Strengths: Extremely simple implementation, no need to handle pre/post-processing. Weaknesses: Less control over the detection process, model may not be as customizable.