I need to normalize the Lucene scores between 0 and 1.
For example, a random query returns the following scores...
8.864665
2.792687
2.792687
2.792687
2.792687
0.49009037
0.33730242
0.33730242
0.33730242
0.33730242
What's the biggest score ? 10.0 ?
thanks
Lucene uses a combination of the Vector Space Model (VSM) and the Boolean model of information Retrieval to determine how relevant a document is to a user's query. It assigns a default score between 0 and 1 to all search results, depending on multiple factors related to document relevancy.
Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.
You can divide all scores with the maximum score to get scores between 0 and 1.
However, please note that the normalised scores should be used to compare the results of a single query only. It is not correct to compare the scores (normalised or not) of results from 2 different queries.
There is no good standard way to normalize scores with lucene. Read this: ScoresAsPercentages and this explanation
In your case the highest score is the score of the first result, if the results are sorted by score. But this score will be different for every other query.
See also how-do-i-normalise-a-solr-lucene-score
There is no maximum score in Solr, it depends on too many variables, so it can't be predicted.
But you can implement something called normalized score (Scores As Percentages) which is not recommended.
See related links for more details:
Is it possible to set a Solr Score threshold 'reasonably', independent of results returned? (i.e. Is Solr Scoring standardized in any way)
how do I normalise a solr/lucene score?
Remove results below a certain score threshold in Solr/Lucene?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With