Which analyzers should be used for indexing and for searching when I want an exact match to rank higher then a "partial" match? Possibly set up custom scoring in a Similarity
class?
For example, when my index consist of car parts
, car
, and car shop
(indexed with StandardAnalyzer
on lucene 3.5), a query for "car"
results in:
(basically returned in the order in which they were added, since they all get the same score).
What I would like to see is car
ranked first, then the other results (doesn't really matter which order, I assume the analyzer can influence that).
All three matches are exact (term car being matched, not 'ca' or 'ar') :)
If there's no more content in these fields ("car parts", "car" and "car shop"), then you could use lengthNorm()
or computeNorm()
(depending on Lucene version), to give shorter fields more weight so that car gets higher score for being shorter. In Lucene 3.3.0, DefaultSimilarity.computeNorm() looks like this:
return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
where numTerms
is the total number of terms in the field. So it's surprising "car" and "car shop" documents have the same score, because for "car" the norm is 1 and for "car shop" it should be 0.7 (assuming boost of 1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With