Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scoring in elasticsearch percolate-response

Current Situation

I am using the percolate feature of elasticsearch. It works all well - I get the matching percolate-ids back for a new document and can build basically an inverse search. Up until now all great.

Problem

Here comes the problem: I want to have a score expressing how well the given document matches the query of a percolator (exactly the score a normal query gives me). To do this I added the track_scores, but got no luck.

I found this in the documentation for track_scores:

...The score is based on the query and represents how the query matched to the percolate query’s metadata and not how the document being percolated matched to the query...

Is what I want/need even possible?

Example showing the problem

Here a sample demonstrating the problem (taken from elasticsearch.org). Here the score returned in the percolate-response is always 1.0, regardless of the input document:

//Index the percolator
curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
    "query" : {
        "match" : {
            "message" : "bonsai tree"
        }
    }
}'

Percolate first document:

curl -XGET 'localhost:9200/my-index/message/_percolate' -d '{
    "doc" : {
        "message" : "A new bonsai tree in the office"
    },
    "track_scores" : "true"
}'


//...returns
{"took": 1, "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
}, "total": 1, "matches": [
    {
        "_index": "my-index",
        "_id": "1",
        "_score": 1.0 <-- Score
    }
]}

Percolate a second (different) one:

//Percolate a second one
curl -XGET 'localhost:9200/my-index/message/_percolate' -d '{
    "doc" : {
        "message" : "A new bonsai tree in the office next to another bonsai tree is cool!"
    },
     "track_scores" : "true"
}'


//...returns
{"took": 3, "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
}, "total": 1, "matches": [
    {
        "_index": "my-index",
        "_id": "1",
        "_score": 1.0 <-- SAME Score, but different document (other score needed here!)
    }
]}

What would I need

I want to have a score of something like 0.8 for the first document and something like 0.9 for the second one. But they can not have the same score like they did here. How can I achieve what I want?

Thanks a lot for any idea and help.

like image 247
Patrick Meier Avatar asked Jun 20 '14 21:06

Patrick Meier


People also ask

What is the score in Elasticsearch?

The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25.

What is Elasticsearch percolate?

The Elasticsearch percolator is typically defined as “search upside down” for the following reasons: You index queries instead of documents. This registers the query in memory, so it can be quickly run later. You send a document to Elasticsearch instead of a query.

How do you get a max score on Elasticsearch?

The basic mechanics are as follows: ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.

What is percolate query?

Percolate queries can be simply thought of as an inverse search. Instead of sending a query to an index and getting the matching documents, you send a document to an index and get the matching queries. This is exactly what most alerting systems need.


1 Answers

Score is relative to other documents in the data set. You could potentially do some sort of custom scoring where you only focus on term frequency/inverse document frequency of the document on hand, but probably won't be terribly effective, but might be good enough.

I am not not sure if this is a viable solution for your problem, but one approach would be re-run all matching percolate queries against the whole dataset and grab your docs score from a that and re-index the document with that data. Since it is all relative, this would potentially require you to then update all the other documents matching the query. Likely, it would be best to do the global re-score at some set interval.

like image 152
ppearcy Avatar answered Nov 15 '22 04:11

ppearcy