Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene scoring: in what context is queryNorm used?

I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like:

score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d))

I understand every component in this formula except queryNorm(q). As explained by the official documentation,

queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable.

Why do I need to compare scores between different queries? In another word, could you give an example to show in which context queryNorm(q) is useful?

like image 676
Yuhao Avatar asked May 28 '13 06:05

Yuhao


1 Answers

Good question, I've wondered this myself. According to this ScoresAsPercentages argument, attempting to compare different queries or indexes scores, or even scores on the same query and index at different times, is a bad idea, and I agree.

My understanding is that, while queryNorm really doesn't make them strictly comparable, it does help. They are closer to comparable with the Default queryNorm than without.

I suppose it could also enable people to write their own similarity, and use this call to create normalized, comparable scores, using algorithms that work in their particular case.

There has been some discussion on dropping it, which you might find interesting.

like image 152
femtoRgon Avatar answered Oct 29 '22 16:10

femtoRgon