I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like:
score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d))
I understand every component in this formula except queryNorm(q). As explained by the official documentation,
queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable.
Why do I need to compare scores between different queries? In another word, could you give an example to show in which context queryNorm(q) is useful?
Good question, I've wondered this myself. According to this ScoresAsPercentages argument, attempting to compare different queries or indexes scores, or even scores on the same query and index at different times, is a bad idea, and I agree.
My understanding is that, while queryNorm
really doesn't make them strictly comparable, it does help. They are closer to comparable with the Default queryNorm than without.
I suppose it could also enable people to write their own similarity, and use this call to create normalized, comparable scores, using algorithms that work in their particular case.
There has been some discussion on dropping it, which you might find interesting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With