I'm not quite sure how to implement the "Bag of Words" approach with HOG descriptors. I've checked several sources which usually provide several steps to follow:
The step which involves magic (3) is not really clear. If I don't use OpenCV, how would I implement it?
The HOGs are vectors which are calculated cell-wise. So I have a vector for each cell. I could iterate over the vector and calculate the closest centroid for each element of the vector and create the histogram accordingly. Would this be a proper way to do it? But if so, I still have vectors of different sizes and no benefit from it.
Main steps can be expressed;
1- Extract features from your entire training set. (HOG feature for your aim)
2- Cluster those features into a vocabulary V; you get K distinct cluster centers.(K-Means, K-Medoid. Your hyperparameter will be K)
3- Encode each training image as a histogram of the number of times each vocabulary element shows up in the image. Each image is then represented by a length-K vector.
For example; first element of K maybe occurs 5 times, second element of K maybe occurs 10 times in your image. Doesn't matter at the end you will have a vector which has K elements.
K[0] = 5 k[1] = 10 .... .... K[n] = 3
4- Train the classifier using this vector. (Linear SVM)
When given a test image, extract the features. Now represent the test image as a histogram of the number of times each cluster center from V was closest to a feature in the test image. This is a length K vector again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With