In OpenCV's haar cascade files, what are the "left" and "right" values, and how does this refer to the "threshold" value? Thanks!
Just for reference, here's the structure of the files:
<haarcascade_frontalface_alt type_id="opencv-haar-classifier">
<size>20 20</size>
<stages>
<_>
<!-- stage 0 -->
<trees>
<_>
<!-- tree 0 -->
<_>
<!-- root node -->
<feature>
<rects>
<_>3 7 14 4 -1.</_>
<_>3 9 14 2 2.</_></rects>
<tilted>0</tilted></feature>
<threshold>4.0141958743333817e-003</threshold>
<left_val>0.0337941907346249</left_val>
<right_val>0.8378106951713562</right_val></_></_>
<_>
Haar Cascade is a machine learning-based approach where a lot of positive and negative images are used to train the classifier. Positive images – These images contain the images which we want our classifier to identify. Negative Images – Images of everything else, which do not contain the object we want to detect.
This method based on a machine learning approach where a cascade function is trained from a lot of positive and negative images. Then after that used to detect objects in the images. We will understand the object detection using Haar cascade classifier.
So what is Haar Cascade? It is an Object Detection Algorithm used to identify faces in an image or a real time video. The algorithm uses edge or line detection features proposed by Viola and Jones in their research paper “Rapid Object Detection using a Boosted Cascade of Simple Features” published in 2001.
Haar cascade classifier detects more number of faces than the LBP classifier in an image. Table 1 and Table 2 contains the execution time, number of faces detected and accuracy of both classifiers.
The "left" and "right" refer to the gradient values of a particular shape. These particular shapes are not specifically a left rectangle and a right rectangle. Instead, it just refers to sections of a particular configuration (sometimes more than one section if there are more than 2). There is a diagram in the David Haar paper which helps explain this.
Here is an ascii representation (= is filled, - unfilled):
==== ==-- =--=
==== ==-- =--=
---- ==-- =--=
---- ==-- =--=
Overall, the naming is bad convention. Instead, it should be named "gradient top", "gradient bottom" (2), "gradient left", "gradient right" (2), "gradient left", "gradient center", "gradient bottom" (3), respectively. Rotated, edge, and other shapes should be named to uniquely identify the sections.
In the source code of OpenCV, you will find cvhaar.cpp
that gives some insight into how Haar cascade works. Unfortunately, this is essentially no commentary, nor does the documentation help much. Here's my understanding of how it works.
In the function icvEvalHidHaarClassifier()
, the sum is computed for the the features of a single CvHidHaarTreeNode
.
If this sum is less than the threshold, the "left" node is followed, and the process is repeated. Otherwise, the "right" node is followed, again repeating. This is reflected by the following statement:
idx = sum < t ? node->left : node->right;
The loop is broken when the "left" or "right" node is a negative value. In this case, the sum is no longer computed for this feature, but the threshold value for that feature is returned as the result of the classifier.
I put "left" and "right" in quotes because, as you say, they have nothing to do with the feature position. Instead, they reflect which way the cascade "falls": below the threshold, the cascade falls left, above the threshold, it falls right.
Let us now step back to the representation of these nodes. In the XML, you will see the representation of the nodes not as indexes, but as values:
<left_val>0.0337941907346249</left_val>
<right_val>0.8378106951713562</right_val>
These numbers are in fact node names that are looked up using cvGetFileNodeByName()
. I don't know exactly how this works inside OpenCV, but now I hope you at least have a better idea how the cascade works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With