Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize Lucene scores?

I need to normalize the Lucene scores between 0 and 1.

For example, a random query returns the following scores...

8.864665
2.792687
2.792687
2.792687
2.792687
0.49009037
0.33730242 
0.33730242 
0.33730242 
0.33730242 

What's the biggest score ? 10.0 ?

thanks

like image 765
aneuryzm Avatar asked Mar 21 '11 14:03

aneuryzm


People also ask

How is Lucene score calculated?

Lucene uses a combination of the Vector Space Model (VSM) and the Boolean model of information Retrieval to determine how relevant a document is to a user's query. It assigns a default score between 0 and 1 to all search results, depending on multiple factors related to document relevancy.

How does Lucene work?

Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.


3 Answers

You can divide all scores with the maximum score to get scores between 0 and 1.

However, please note that the normalised scores should be used to compare the results of a single query only. It is not correct to compare the scores (normalised or not) of results from 2 different queries.

like image 102
nikhil500 Avatar answered Oct 23 '22 23:10

nikhil500


There is no good standard way to normalize scores with lucene. Read this: ScoresAsPercentages and this explanation

In your case the highest score is the score of the first result, if the results are sorted by score. But this score will be different for every other query.

See also how-do-i-normalise-a-solr-lucene-score

like image 24
morja Avatar answered Oct 23 '22 21:10

morja


There is no maximum score in Solr, it depends on too many variables, so it can't be predicted.

But you can implement something called normalized score (Scores As Percentages) which is not recommended.

See related links for more details:

Is it possible to set a Solr Score threshold 'reasonably', independent of results returned? (i.e. Is Solr Scoring standardized in any way)

how do I normalise a solr/lucene score?

Remove results below a certain score threshold in Solr/Lucene?

like image 37
kenorb Avatar answered Oct 23 '22 22:10

kenorb