Binary features and Locality Sensitive Hashing (LSH)

Tags:

I am studying FLANN, a library for approximate nearest neighbors search.

For the LSH method they represent an object (point in search space), as an array of unsigned int. I am not sure why they do this, and not represent a point simply as a double array (which would represent a point in multi-dimensional vector space). Maybe because LSH is used for binary features? Can someone share more about the possible use of unsigned int in this case? Why unsigned int if you only need a 0 and 1 for each feature?

Thanks

220

asked Jan 12 '13 18:01

user1796942

1 Answers

Please note that I will refer to the latest FLANN release, i.e. flann-1.8.3 at the time of writing.

For the LSH method they represent an object (point in search space), as an array of unsigned int

No: this is wrong. The LshIndex class includes a buildIndexImpl method that implements the LSH indexing. Since the LSH is basically a collection of hash tables, the effective indexing occurs on the LshTable class.

The elementary indexing method, i.e. the method that indexes one feature vector (aka descriptor, or point) at a time is:

/** Add a feature to the table
 * @param value the value to store for that feature
 * @param feature the feature itself
 */
void add(unsigned int value, const ElementType* feature) {...}

Note: the buildIndexImpl method uses the alternative version that simply iterates over the features, and call the above method on each.

As you can see this method has 2 arguments which is a pair (ID, descriptor):

value which is unsigned int represents the feature vector unique numerical identifier (aka feature index)
feature represents the feature vector itself

If you look at the implementation you can see that the first step consists in hashing the descriptor value to obtain the related bucket key (= the identifier of the slot pointing to the bucket in which this descriptor ID will be stored):

BucketKey key = getKey(feature);

In practice the getKey hashing function is only implemented for binary descriptors, i.e. descriptors that can be represented as an array of unsigned char:

// Specialization for unsigned char
template<>
inline size_t LshTable<unsigned char>::getKey(const unsigned char* feature) const {...}

Maybe because LSH is used for binary features?

Yes: as stated above, the FLANN LSH implementation works in the Hamming space for binary descriptors.

If you were to use descriptors with real values (in R**d) you should refer to the original paper that includes details about how to convert the feature vectors into binary strings so as to use the Hamming space and hash functions.

Can someone share more about the possible use of unsigned int in this case? Why unsigned int if you only need a 0 and 1 for each feature?

See above: the unsigned int value is only used to store the related ID of each feature vector.

187

answered Sep 29 '22 21:09

deltheil

Related questions
                            
                                Logistic regression using SciPy
                            
                                File format for classification using SVM light
                            
                                Is there any best practice to prepare features for text-based classification?
                            
                                Break up Random forest classification fit into pieces in python?
                            
                                How convert ML VectorUDT features from .mllib to .ml type
                            
                                how to compute AUC(Area Under Curve) for recommendation system evaluation
                            
                                Can I use `tf.nn.dropout` to implement DropConnect?
                            
                                plotting in octave syntax
                            
                                How to restore weights with different names but same shapes Tensorflow?
                            
                                How to tie word embedding and softmax weights in keras?
                            
                                How to make predictions with tf.estimator.Estimator from checkpoint?
                            
                                Tensorflow Eager and Tensorboard Graphs?
                            
                                Keras: Accuracy Drops While Finetuning Inception
                            
                                Kernel in a logistic regression model LogisticRegression scikit-learn sklearn
                            
                                XGBoost: AttributeError: 'DataFrame' object has no attribute 'feature_names'
                            
                                Machine learning library for .net analog of Apache Mahout [closed]
                            
                                Is it possible to detect blur, exposure, orientation of an image programmatically?
                            
                                Why getting different results with MALLET topic inference for single and batch of documents?
                            
                                Build a custom svm kernel matrix with opencv
                            
                                Errors due to vowpal wabbit's dependencies on boost library

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Binary features and Locality Sensitive Hashing (LSH)

Tags:

machine-learning

computer-vision

nearest-neighbor

user1796942

People also ask

1 Answers

deltheil

Recent Activity

Donate For Us