Is it possible to query Elastic Search with a feature vector?

Tags:

I'd like to store an n-dimensional feature vector, e.g. <1.00, 0.34, 0.22, ..., 0>, with each document, and then provide another feature vector as a query, with the results sorted in order of cosine similarity. Is this possible with Elastic Search?

995

asked May 13 '15 23:05

neptune

1 Answers

I don't have an answer particular to Elastic Search because I've never used it (I use Lucene on which Elastic search is built). However, I'm trying to give a generic answer to your question. There are two standard ways to obtain the nearest vectors given a query vector, described as follows.

K-d tree

The first approach is to store the vectors in memory with the help of a data structure that supports nearest neighbour queries, e.g. k-d trees. A k-d tree is a generalization of the binary search tree in the sense that every level of the binary search tree partitions one of the k dimensions into two parts. If you have enough space to load all the points in memory, it is possible to apply the nearest neighbour search algorithm on k-d trees to obtain a list of retrieved vectors sorted by the cosine similarity values. The obvious disadvantage of this method is that it does not scale to huge sets of points, as often encountered in information retrieval.

Inverted Quantized Vectors

The second approach is to use inverted quantized vectors. A simple range-based quantization assigns pseudo-terms or labels to the real numbers of a vector so that these can then later be indexed by Lucene (or for that matter Elastic search).

For example, we may assign the label A to the range [0, 0.1), B to the range [0.1, 0.2) and so on... The sample vector in your question is then encoded as (J,D,C,..A). (because [.9,1] is J, [0.3,0.4) is D and so on).

Consequently, a vector of real numbers is thus transformed into a string (which can be treated as a document) and hence indexed with a standard information retrieval (IR) tool. A query vector is also transformed into a bag of pseudo-terms and thus one can compute a set of other similar vectors in the collection most similar (in terms of cosine similarity or other measure) to the current one.

The main advantage of this method is that it scales well for massive collection of real numbered vectors. The key disadvantage is that the computed similarity values are mere approximations to the true cosine similarities (due to the loss encountered in quantization). A smaller quantization range achieves better performance at the cost of increased index size.

172

answered Oct 16 '22 08:10

Debasis

Related questions
                            
                                Spring Data Elasticsearch's ElasticsearchTemplate vs ElasticsearchRepository
                            
                                Logstash not creating index on Elasticsearch
                            
                                Using a script to conditionally update a document in Elasticsearch
                            
                                Elasticsearch - NoNodeAvailableException
                            
                                set default analyzer of index
                            
                                BigQuery vs Elasticsearch for analysing and storing application logs
                            
                                How to set the query via JSON to an Elasticsearch SearchRequest?
                            
                                SocketTimeoutException while retrieving or inserting data into Elastic Search by using Rest High Level Client
                            
                                Confusion between mappings and types in ElasticSearch
                            
                                How to use minimum_should_match to search in multiple fields?
                            
                                Faraday::ConnectionFailed, Connection refused - connect(2) for “localhost” port 9200 Error Ruby on Rails
                            
                                how to search for tags in elasticsearch
                            
                                How to sort on field type ''text" in Elastic search
                            
                                What exactly is the primary_term in elastic search?
                            
                                Elastic Search and Y10k (years with more than 4 digits)
                            
                                Elastic Search using NEST Field Boosting
                            
                                How to run an embedded elastic search instance for testing
                            
                                Kibana 3 Milestone 4 and Graphite Integration
                            
                                Running Elasticsearch server on a mobile device (android / iphone / ios)
                            
                                Elasticsearch query_string nested query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to query Elastic Search with a feature vector?

Tags:

elasticsearch

information-retrieval

feature-extraction

neptune

People also ask

1 Answers

Debasis

Recent Activity

Donate For Us