I am performing a boolean query with multiple terms. I only want to process results with a score above a particular threshold. My problem is, I don't understand how this value is calculated. I understand that high numbers mean its a good match, and low numbers mean its a bad match, but there doesn't seem to be any upper bounds?
Is it possible to normalize the scores over the range [0,1]?
Here is a page describing how scores are calculated in Lucene:
http://lucene.apache.org/java/3_0_0/scoring.html
The short answer is that the absolute values of each document's score doesn't really mean anything outside the context of a given search result set. In other words, there isn't really a good way of translating the scores to a human definition of relevance, even if you do normalize the scores.
That being said you can easily normalize the scores by dividing each hit's score by the maximum score. So if the first hit's score is 2.5, then divide every hit's score by 2.5, and you'll get a number in between 0 and 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With