I've been implementing an adaptation of Viola-Jones' face detection algorithm. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular features inside it in every position with every size possible. These features can consist of two, three or four rectangles. The following example is presented. <img src="https://i.stack.imgur.com/5MKl7.png" alt="Rectangle features"> They claim the exhaustive set is more than 180k (section 2): <blockquote> Given that the base resolution of the detector is 24x24, the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectangle features is overcomplete. </blockquote> The following statements are not explicitly stated in the paper, so they are assumptions on my part: <ol> <li>There are only 2 two-rectangle features, 2 three-rectangle features and 1 four-rectangle feature. The logic behind this is that we are observing the difference between the highlighted rectangles, not explicitly the color or luminance or anything of that sort.</li> <li>We cannot define feature type A as a 1x1 pixel block; it must at least be at least 1x2 pixels. Also, type D must be at least 2x2 pixels, and this rule holds accordingly to the other features.</li> <li>We cannot define feature type A as a 1x3 pixel block as the middle pixel cannot be partitioned, and subtracting it from itself is identical to a 1x2 pixel block; this feature type is only defined for even widths. Also, the width of feature type C must be divisible by 3, and this rule holds accordingly to the other features.</li> <li>We cannot define a feature with a width and/or height of 0. Therefore, we iterate x and y to 24 minus the size of the feature.</li> </ol> Based upon these assumptions, I've counted the exhaustive set: <pre class="prettyprint"><code>const int frameSize = 24; const int features = 5; // All five feature types: const int feature[features][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}}; int count = 0; // Each feature: for (int i = 0; i < features; i++) { int sizeX = feature[i][0]; int sizeY = feature[i][1]; // Each position: for (int x = 0; x <= frameSize-sizeX; x++) { for (int y = 0; y <= frameSize-sizeY; y++) { // Each size fitting within the frameSize: for (int width = sizeX; width <= frameSize-x; width+=sizeX) { for (int height = sizeY; height <= frameSize-y; height+=sizeY) { count++; } } } } } </code></pre> The result is 162,336. The only way I found to approximate the "over 180,000" Viola & Jones speak of, is dropping assumption #4 and by introducing bugs in the code. This involves changing four lines respectively to: <pre class="prettyprint"><code>for (int width = 0; width < frameSize-x; width+=sizeX) for (int height = 0; height < frameSize-y; height+=sizeY) </code></pre> The result is then 180,625. (Note that this will effectively prevent the features from ever touching the right and/or bottom of the subframe.) Now of course the question: have they made a mistake in their implementation? Does it make any sense to consider features with a surface of zero? Or am I seeing it the wrong way?

Upon closer look, your code looks correct to me; which makes one wonder whether the original authors had an off-by-one bug. I guess someone ought to look at how OpenCV implements it! Nonetheless, one suggestion to make it easier to understand is to flip the order of the for loops by going over all sizes first, then looping over the possible locations given the size: <pre class="prettyprint"><code>#include <stdio.h> int main() { int i, x, y, sizeX, sizeY, width, height, count, c; /* All five shape types */ const int features = 5; const int feature[][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}}; const int frameSize = 24; count = 0; /* Each shape */ for (i = 0; i < features; i++) { sizeX = feature[i][0]; sizeY = feature[i][1]; printf("%dx%d shapes:\n", sizeX, sizeY); /* each size (multiples of basic shapes) */ for (width = sizeX; width <= frameSize; width+=sizeX) { for (height = sizeY; height <= frameSize; height+=sizeY) { printf("\tsize: %dx%d => ", width, height); c=count; /* each possible position given size */ for (x = 0; x <= frameSize-width; x++) { for (y = 0; y <= frameSize-height; y++) { count++; } } printf("count: %d\n", count-c); } } } printf("%d\n", count); return 0; } </code></pre> with the same results as the previous <code>162336</code> <hr> To verify it, I tested the case of a 4x4 window and manually checked all cases (easy to count since 1x2/2x1 and 1x3/3x1 shapes are the same only 90 degrees rotated): <pre class="prettyprint"><code>2x1 shapes: size: 2x1 => count: 12 size: 2x2 => count: 9 size: 2x3 => count: 6 size: 2x4 => count: 3 size: 4x1 => count: 4 size: 4x2 => count: 3 size: 4x3 => count: 2 size: 4x4 => count: 1 1x2 shapes: size: 1x2 => count: 12 +-----------------------+ size: 1x4 => count: 4 | | | | | size: 2x2 => count: 9 | | | | | size: 2x4 => count: 3 +-----+-----+-----+-----+ size: 3x2 => count: 6 | | | | | size: 3x4 => count: 2 | | | | | size: 4x2 => count: 3 +-----+-----+-----+-----+ size: 4x4 => count: 1 | | | | | 3x1 shapes: | | | | | size: 3x1 => count: 8 +-----+-----+-----+-----+ size: 3x2 => count: 6 | | | | | size: 3x3 => count: 4 | | | | | size: 3x4 => count: 2 +-----------------------+ 1x3 shapes: size: 1x3 => count: 8 Total Count = 136 size: 2x3 => count: 6 size: 3x3 => count: 4 size: 4x3 => count: 2 2x2 shapes: size: 2x2 => count: 9 size: 2x4 => count: 3 size: 4x2 => count: 3 size: 4x4 => count: 1 </code></pre>

Viola-Jones' face detection claims 180k features

Tags:

algorithm

image-processing

computer-vision

face-detection

viola-jones

I've been implementing an adaptation of Viola-Jones' face detection algorithm. The technique relies upon placing a subframe of 24x24 pixels within an image, and subsequently placing rectangular features inside it in every position with every size possible.

These features can consist of two, three or four rectangles. The following example is presented.

Rectangle features

They claim the exhaustive set is more than 180k (section 2):

Given that the base resolution of the detector is 24x24, the exhaustive set of rectangle features is quite large, over 180,000 . Note that unlike the Haar basis, the set of rectangle features is overcomplete.

The following statements are not explicitly stated in the paper, so they are assumptions on my part:

There are only 2 two-rectangle features, 2 three-rectangle features and 1 four-rectangle feature. The logic behind this is that we are observing the difference between the highlighted rectangles, not explicitly the color or luminance or anything of that sort.
We cannot define feature type A as a 1x1 pixel block; it must at least be at least 1x2 pixels. Also, type D must be at least 2x2 pixels, and this rule holds accordingly to the other features.
We cannot define feature type A as a 1x3 pixel block as the middle pixel cannot be partitioned, and subtracting it from itself is identical to a 1x2 pixel block; this feature type is only defined for even widths. Also, the width of feature type C must be divisible by 3, and this rule holds accordingly to the other features.
We cannot define a feature with a width and/or height of 0. Therefore, we iterate x and y to 24 minus the size of the feature.

Based upon these assumptions, I've counted the exhaustive set:

const int frameSize = 24; const int features = 5; // All five feature types: const int feature[features][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};  int count = 0; // Each feature: for (int i = 0; i < features; i++) {     int sizeX = feature[i][0];     int sizeY = feature[i][1];     // Each position:     for (int x = 0; x <= frameSize-sizeX; x++) {         for (int y = 0; y <= frameSize-sizeY; y++) {             // Each size fitting within the frameSize:             for (int width = sizeX; width <= frameSize-x; width+=sizeX) {                 for (int height = sizeY; height <= frameSize-y; height+=sizeY) {                     count++;                 }             }         }     } }

The result is 162,336.

The only way I found to approximate the "over 180,000" Viola & Jones speak of, is dropping assumption #4 and by introducing bugs in the code. This involves changing four lines respectively to:

for (int width = 0; width < frameSize-x; width+=sizeX) for (int height = 0; height < frameSize-y; height+=sizeY)

The result is then 180,625. (Note that this will effectively prevent the features from ever touching the right and/or bottom of the subframe.)

Now of course the question: have they made a mistake in their implementation? Does it make any sense to consider features with a surface of zero? Or am I seeing it the wrong way?

967

asked Nov 10 '09 12:11

Paul Lammertsma

2 Answers

all. There is still some confusion in Viola and Jones' papers.

In their CVPR'01 paper it is clearly stated that

"More specifically, we use three kinds of features. The value of a two-rectangle feature is the difference between the sum of the pixels within two rectangular regions. The regions have the same size and shape and are horizontally or vertically adjacent (see Figure 1). A three-rectangle feature computes the sum within two outside rectangles subtracted from the sum in a center rectangle. Finally a four-rectangle feature".

In the IJCV'04 paper, exactly the same thing is said. So altogether, 4 features. But strangely enough, they stated this time that the the exhaustive feature set is 45396! That does not seem to be the final version.Here I guess that some additional constraints were introduced there, such as min_width, min_height, width/height ratio, and even position.

Note that both papers are downloadable on his webpage.

answered Sep 22 '22 04:09

Laoma from Singapore

Upon closer look, your code looks correct to me; which makes one wonder whether the original authors had an off-by-one bug. I guess someone ought to look at how OpenCV implements it!

Nonetheless, one suggestion to make it easier to understand is to flip the order of the for loops by going over all sizes first, then looping over the possible locations given the size:

#include <stdio.h> int main() {     int i, x, y, sizeX, sizeY, width, height, count, c;      /* All five shape types */     const int features = 5;     const int feature[][2] = {{2,1}, {1,2}, {3,1}, {1,3}, {2,2}};     const int frameSize = 24;      count = 0;     /* Each shape */     for (i = 0; i < features; i++) {         sizeX = feature[i][0];         sizeY = feature[i][1];         printf("%dx%d shapes:\n", sizeX, sizeY);          /* each size (multiples of basic shapes) */         for (width = sizeX; width <= frameSize; width+=sizeX) {             for (height = sizeY; height <= frameSize; height+=sizeY) {                 printf("\tsize: %dx%d => ", width, height);                 c=count;                  /* each possible position given size */                 for (x = 0; x <= frameSize-width; x++) {                     for (y = 0; y <= frameSize-height; y++) {                         count++;                     }                 }                 printf("count: %d\n", count-c);             }         }     }     printf("%d\n", count);      return 0; }

with the same results as the previous 162336

To verify it, I tested the case of a 4x4 window and manually checked all cases (easy to count since 1x2/2x1 and 1x3/3x1 shapes are the same only 90 degrees rotated):

2x1 shapes:         size: 2x1 => count: 12         size: 2x2 => count: 9         size: 2x3 => count: 6         size: 2x4 => count: 3         size: 4x1 => count: 4         size: 4x2 => count: 3         size: 4x3 => count: 2         size: 4x4 => count: 1 1x2 shapes:         size: 1x2 => count: 12             +-----------------------+         size: 1x4 => count: 4              |     |     |     |     |         size: 2x2 => count: 9              |     |     |     |     |         size: 2x4 => count: 3              +-----+-----+-----+-----+         size: 3x2 => count: 6              |     |     |     |     |         size: 3x4 => count: 2              |     |     |     |     |         size: 4x2 => count: 3              +-----+-----+-----+-----+         size: 4x4 => count: 1              |     |     |     |     | 3x1 shapes:                                |     |     |     |     |         size: 3x1 => count: 8              +-----+-----+-----+-----+         size: 3x2 => count: 6              |     |     |     |     |         size: 3x3 => count: 4              |     |     |     |     |         size: 3x4 => count: 2              +-----------------------+ 1x3 shapes:         size: 1x3 => count: 8                  Total Count = 136         size: 2x3 => count: 6         size: 3x3 => count: 4         size: 4x3 => count: 2 2x2 shapes:         size: 2x2 => count: 9         size: 2x4 => count: 3         size: 4x2 => count: 3         size: 4x4 => count: 1

155

answered Sep 19 '22 04:09

Amro

Related questions
                            
                                Efficiently find binary strings with low Hamming distance in large set
                            
                                How is CPU usage calculated?
                            
                                Sort on a string that may contain a number
                            
                                How to rank a million images with a crowdsourced sort
                            
                                Take n random elements from a List<E>?
                            
                                How to make a for loop variable const with the exception of the increment statement?
                            
                                Differences between OT and CRDT
                            
                                What is the minimum cost to connect all the islands?
                            
                                How to understand the knapsack problem is NP-complete?
                            
                                Comparing object graph representation to adjacency list and matrix representations
                            
                                Support Resistance Algorithm - Technical analysis
                            
                                Rounding to an arbitrary number of significant digits
                            
                                Count number of 1's in binary representation
                            
                                Interview Question: Merge two sorted singly linked lists without creating new nodes
                            
                                Why does the greedy coin change algorithm not work for some coin sets?
                            
                                Is it faster to sort a list after inserting items or adding them to a sorted list
                            
                                Unsupervised clustering with unknown number of clusters
                            
                                What is the best image downscaling algorithm (quality-wise)?
                            
                                What is the fastest way to transpose a matrix in C++?
                            
                                Choice of programming language for learning data structures and algorithms [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With