Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding encoding of extracted features

The encoding I am focusing on is fisher encoding as i has shown to have best results with my work. So I want to test fisher encoding on my extracted (SIFT) features and test the performance of the system with or without encoding.

Rather than starting fresh I found that vl_feat has a built in library for fisher encoding and they have a tutorial for that as well linked here

Now I have already done most of what is required but what actually does get encoded is confusing me, for example the tutorial makes it clear that fisher encoding is performed using the parameters obtained by GMM such as [means, covariances, priors] and the SIFT extracted features are to be used here in GMM as per the tutorial:

The Fisher encoding uses GMM to construct a visual word dictionary. To exemplify constructing a GMM, consider a number of 2 dimensional data points. In practice, these points would be a collection of SIFT or other local image features.

numFeatures = 5000 ;
dimension = 2 ;
data = rand(dimension,numFeatures) ;

numClusters = 30 ;
[means, covariances, priors] = vl_gmm(data, numClusters);

Then once I have performed this step I am to encode another data set? This is what confuses me. I have already used my extracted SIFT features in generating the parameters for GMM.

Next, we create another random set of vectors, which should be encoded using the Fisher Vector representation and the GMM just obtained:

encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);

So here encoded is the final result but WHAT has it encoded? I want my SIFT features that I extracted from my images to be encoded, but if I follow the tutorial that is used in GMM. If that is the case then what is datatoBeEncoded? Am I to use the SIFT feats here again?

Thank you

Update:

@Shai

Thank you but I believe I must be doing something wrong. I dont quite understand what you mean by "compare images to themselves". I have 4 classes, from each class 1000 images. So I used the first 600 images from class 1 to learn the gmm parameters and then use these parameters to encode the fisher vectors

numClusters = 128 ;
[means, covariances, priors] = vl_gmm(data, numClusters);

So each means, covariances are of the size 128 x 128 and priors of the size 1 x 128

Now when I use these to encode the fisher vector on the 400 images using the function

encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);

the size of the encoding is very different, something along the size of 12000 x 1. These cannot be compared to the models generated.

I already had a system that was working on the non-encoded version of the dataset and it was working well, but i wanted to see how encoding will make a difference, theoretically the results should be improved.

I can add the code here if needed, but it is for UBM- GMM and the reason I am confused is because the training method you mentioned is what I am using for UBM.

If I just encode the test images I cannot use them in the classifier because of the size mismatch.

Maybe I am not picking this correctly or making some silly mistake, would it be possible to get a simple example through which I can understand the working.

Thanks a lot

like image 260
StuckInPhDNoMore Avatar asked Sep 29 '22 02:09

StuckInPhDNoMore


1 Answers

You have two phases in the process:
(1) training where you use learn some statistical properties of your domain, and
(2) testing where you use the learned representation/models and apply them to new samples.

Accordingly, you should split your dataset of features into two "splits" one for learning the GMMs for the Fisher encoding (a training set) , and another split to apply the encoding to (a test set).

Usually you sample a significant amount of images that well represent your domain of interest (for example, if you are interested in people you should consider many pictures of people indoor and outdoor, closeups and group photos etc.) You extract as many SIFT descriptors as you can from these training images and use them to learn the model:

numClusters = 30 ;
[means, covariances, priors] = vl_gmm(TrainingData, numClusters);

Once you have this model save it, then you can apply it to new photos to encode them

encoding = vl_fisher(TestData, means, covariances, priors);

Note that while TrainingData is in general very large and may be collected from dozens (or even hundreds) of images, TestData may be significantly smaller, and even be descriptors collected from a single image.

like image 189
Shai Avatar answered Oct 05 '22 01:10

Shai