Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing Lucene query in Elastic Search's native script scorer

I'd like to write a custom Elastic Search scorer that takes all terms from the document in index, all terms from the query and based on some custom logic calculates the score.

After some research, it seems that the most straight-forward way to implement a custom scorer in Elastic Search in Java is to use its "native scripting" functionality (i.e. implementing AbstractDoubleSearchScript). The problem I have is that I can't find a way to access the original query object in such a script. I can only access the matching document and its fields. Is there some way to get access to the query object that was used for the search?

Alternatively, what is the best way to run custom Java code per result and score the match using my own (complex) algorithm that needs to know the complete term list for both the query and the document?

like image 269
Lukáš Lalinský Avatar asked Mar 27 '15 00:03

Lukáš Lalinský


People also ask

Does elastic use Lucene?

About ElasticsearchElasticsearch is also an open-source search engine built on top of Apache Lucene, as the rest of the ELK Stack, including Logstash and Kibana.

What does Elasticsearch add to Lucene?

Elasticsearch is built on top of Lucene. Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally. Elasticsearch also provides other features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc.

How does Elasticsearch calculate score?

The default scoring algorithm used by Elasticsearch is BM25. There are three main factors that determine a document's score: Term frequency (TF) — The more times that a search term appears in the field we are searching in a document, the more relevant that document is.


1 Answers

Implement a custom Query class and wrap the actual query (for example a boolean query) as its sub query. In the Query class you have api to implement a custom scorer where you can have access to both the query and the current document which you are scoring. To fine grain control the score, implement a custom similarity class.

like image 104
redragons Avatar answered Sep 29 '22 12:09

redragons