Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the score based on number of query terms in elasticsearch?

I want the queries to return a score that gets calculated like:

occurrence of each query term in title + description / number of query terms

for example

EbSearch.add [ 
new_job( id: 1, title: "Java Programmierer", 
description: "Java Programmierer")
]

res = EbSearch.search("Java Programmierer").results.first.score.should == 4

at the moment it outputs 8, because it does the query for each term and sums it up. I could just divide afterwards, but I don't have the analyzed query terms, so compounds could mess up the score.

The query is structured like below:

search = Tire.search index_name do
  query do 
    dis_max do 
       query { string query, fields: ['title^3', 'description.with_synonyms^0.5'], use_dis_max: false, default_operator: "OR" }  
       query { string query, fields: ['title^3', 'description.without_synonyms'], use_dis_max: false, default_operator: "OR"}
    end
  end
end

Any idea how i could solve this problem is greatly appreciated.

EDIT

I realized that i provided not enough context.

Here are some other snippets I already worked out. I wrote a custom SimilarityProvider to disable idf and normalization. https://gist.github.com/outsmartin/6114175

The complete Tire code is found here https://gist.github.com/6114186. It is a little bit more complicated then the example, but it should be understandable.

like image 578
outsmartin Avatar asked Jul 23 '13 16:07

outsmartin


People also ask

How do you count in Elasticsearch query?

The count API allows you to execute a query and get the number of matches for that query. The query can either be provided using a simple query string as a parameter, or using the Query DSL defined within the request body. The count API supports multi-target syntax.

What is score in Elasticsearch query?

The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25.

How do you get a max score on Elasticsearch?

The basic mechanics are as follows: ElasticSearch Score is normalized between 0..1 ( score/max(score) ), we add our ranking score ( also normalized between 0..1 ) and divide by 2.

What is the difference between term and terms in Elasticsearch?

Term query return documents that contain one or more exact term in a provided field. The terms query is the same as the term query, except you can search for multiple values.


1 Answers

You can easily get a list of analyzed terms for your query using analyze command. However, I have to mention that Elasticsearch scoring is much more complicated than it might seem when you run your tests on tiny indices. You can find the formula that Elasticsearch is using in Lucene documentation and you can use explain command to see how this formula is getting applied to your results. I would also suggest testing and tuning your scoring algorithm on an index with a single shard or using dfs_query_then_fetch search type, which produces more precise results on small indices.

like image 95
imotov Avatar answered Nov 15 '22 07:11

imotov