BENCHMARKING, DEEP LEARNING
Detect Faces with Increased Accuracy: Benchmarking Sightcorp’s New Deep Learning-Based Face Detector
Robust face detection is at the heart of Sightcorp’s products and SDKs. Our customers deploy our technology in a variety of locations and conditions. We are expected to detect faces across different video resolutions, distances, light conditions, camera angles, and head poses. It is therefore important for us to define quantitative measures to evaluate ourselves on each of these metrics.
In this post, we’d like to share how we measure performance of face detection across variations in head pose.
Benchmarking our face detector in the context of head pose variation
Most modern face datasets these days already include annotations with some indication about the head pose. For example:
- The WIDER FACE dataset tags faces as having ‘typical pose’ or ‘atypical pose’.
- The VGGFace dataset, divides poses into ‘front’, ‘three-quarter’ and ‘profile’.
While benchmarking on these datasets gives us a good sense of how our face detection holds up across variation in pose, we wanted to go a step further and quantify our detection performance across granular variation in yaw, pitch, and roll. This would allow us to understand if there are certain values of yaw, pitch, and roll for which the detector doesn’t work and the cut-off of yaw, pitch, roll beyond which we need to improve the robustness of our detection.
We repurposed a couple of head pose estimation datasets for this.
The first dataset we used was the Head Pose Image Database from the Prima Project at INRIA Rhone-Alpes:
The head pose database is a benchmark of 2790 monocular face images of 15 persons with variations of pan and tilt angles from -90 to +90 degrees. For every person, 2 series of 93 images (93 different poses) are available.
Here is what the data from the database looks like for one of the subjects:
This dataset encodes yaw and pitch into filenames. We wrote a little Python script to create a ground truth CSV file with a path to each file in this dataset and its corresponding pitch and yaw values (showing just the filename below to keep the table compact):
It’s worth summarising this dataset by yaw and pitch. I turn to R and dplyr for this sort of analyis:
So for each value of yaw and pitch, there are 30 images. Since we know that each image has only one subject, the face detector should detect 30 faces for every combination of yaw and pitch.
Our benchmarking script reads this file, runs the face detector on each image under the path column, and outputs the number of faces detected. If exactly one face is detected, we also output the face rectangle (x, y, width, height). For example, here is a sample of what we got when we ran the benchmark using the Haar Cascade based face detector that ships with OpenCV:
To understand how the Haar detector performs for different values of yaw and pitch, we can summarize this data:
So at first glance, it looks like that for values of pitch and yaw that correspond to non-frontal head poses, the Haar detector has a hard time finding faces. We can visualize this data as a heat map to get the full picture:
And here are the heat maps showing the performance of the Haar Cascade-based face detector vs our own face detector:
(yaw = the angle at which the head is turned towards the side)
(pitch = the angle at which the head is facing up or down)
As we can see from the heat map, the Haar Cascade detector performs reasonably well for frontal and slightly sideways faces, while the detections rapidly decline for more extreme head poses.
Running the R snippet above on the results generated by Sightcorp’s deep learning face detector gives us:
Sightcorp’s deep learning-based face detector
As you’d expect for this class of computer vision problems, our deep learning based approach far outperforms the “classical” method. These visualizations give us actionable insights about the kind of head poses where our face detector can do even better.
Here are some examples of head poses where the Haar approach does not detect a face, but where our deep learning-based approach does:
How you can use our deep learning-based face detector
Our deep learning-based face detector is available as part of our face analysis solution, DeepSight SDK.
With DeepSight SDK, you can:
- Accurately count the number of people that enter a particular area or view a particular digital display.
- Gather demographic data, such as age and gender.
- Ensure privacy by default by incorporating Face Blur.