Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store Hierarchical K-Means tree for a large number of images, using Opencv?

I am trying to make a program that will find similar images from a dataset of images. The steps are

  • extract SURF descriptors for all images
  • store the descriptors
  • Apply knn on the stored descriptors
  • Match the stored descriptors to the query image descriptor using kNN

Now each images SURF descriptor will be stored as Hierarchical k-means tree, now do I store each tree as a separate file or is it possible to build some sort of single tree with all the images descriptors and updated as images are added to dataset.

This is the paper I am basing the program on.

like image 708
AquaAsh Avatar asked Apr 02 '11 05:04

AquaAsh


1 Answers

Are you quite sure you want to do it with SURF descriptors? I'm just working on a similar application myself, and it is based on this paper (Nister, Stewenius) and they swear SIFT descriptors are the way to go. But, I guess you could do it with any other descriptors as well.

Looking at the paper you referenced, it is newer that the work I linked to, but it does not reference either the Nister paper or this work (Sivic, Zisserman), which are, to my knowledge, the base works for all content based image retrieval problems.

For the better understanding of the problem, before I started implementing it, I first read Sivic, Zisserman to get the general idea behind the system. They only apply a simple clustering after extracting all the SIFT descriptors from all the features. They use two different types of features for better accuracy, Shape Adapted (centered on corner like features) and Maximally Stable (corresponding to blobs of high contrast - you can look them up in this paper (Matas et. al.)). The scalability of their system is not all that good because of the direct storage of each feature but they introduced the concept of Inverted files, a technique from Text Analytics (you can read about it's basics here), which simplifies the look up process significantly.

After conquering that work, I recommend going on to Nister, Stewenius where they introduce the concept of hierarchical k-means clustering in L levels for storing the features, as well as for the latter search of the image database. Now, unless I am very very much mistaken, you don't store each descriptor as a separate tree. Instead, you make the tree based on the existing features (where the cluster centers in each level are actually the representative, "central", features for each cluster). Once the tree is constructed to the required depth (they recommend 10 cluster over 6 levels), the cluster centers at the last level are representative for a very small number of features - and thus, you can actually forget all the original features! (or at least, their descriptors). Each original feature can be represented by the corresponding cluster center, and instead of descriptors, for each image you only need to store the information about which cluster centers - features - it contains. This is much easier since you only need to store one or two integers per feature - coding it's path through a tree. The simplest way to see it is if you just code the number of the cluster that the feature belongs to at each level - there's 10 of them (4 bits) - for each level (6 of them, 4*6 < 32bits, so it fits into an integer). You can of course implement the actual coding any way you see fit. Oh, and they also use SIFT descriptors on MSER regions.

Also, if the images you use for building a vocabulary tree are representative (e.g. you're working on a dataset of open-space pictures and you build the tree from just a representative portion of the images, but you know there's no pictures of industrial plant workplaces in the rest of the dataset), you can add the new pictures very very fast. The only thing you need to do for adding any new picture in the dataset is determine which of the calculated cluster centers represent the image features the best (as mentioned before, the last level of cluster centers is quite precise) and store the information about the cluster centers (the before-mentioned integers). Looking up the cluster centers should be really fast - there's only 10 comparisons at each of the 6 levels.

Hope this is actually useful to somebody someday, since the question is just over a year old. :)

like image 168
penelope Avatar answered Sep 29 '22 21:09

penelope