Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr: Scores As Percentages

First of all, I already saw the lucene doc which tells us to not produce score as percentages:

People frequently want to compute a "Percentage" from Lucene scores to determine what is a "100% perfect" match vs a "50%" match. This is also somethings called a "normalized score"

Don't do this.

Seriously. Stop trying to think about your problem this way, it's not going to end well.

Because of these recommandations, I used another way to solve my problem.

However, there are a few points of lucene's argumentation which I don't really understand why they are problematic in some cases.

For the case of this post, I can easily understand why it is bad: if a user does a search and sees the following results:

  • ProductA : 5 stars
  • ProductB : 2 stars
  • ProductC : 1 star

If ProductA was deleted after his first search, next time the user will come, he will be surprised if he sees the following results :

  • ProductB : 5 stars
  • ProductC : 3 stars

So, this problem is exactly what Lucene's doc is pointing out.


Now, let's take another example.

Imagine we have an e-commerce website which is using 'classic search' combined with phonetic search. The phonetic search is here to avoid a maximum number of null results due to spelling mistakes. The score of phonetic results is very low relative to scores of classic search.

In this case, the first idea was to only return results which have at least 10% of the maximum score. Results under this threshold will not be considered as relevant for us, even with classic search.

If I do that, I don't have the problem of the above post because if a document is deleted, it seems logical if the old second product became the first one and the user will not be very suprised (it is the same behavior as if I kept the score as float value).

Furthermore, if scores of phonetic search are very low, as we expect, we will keep the same behavior to only return relevant scores.


So my questions are: is it always bad to normalize score as Lucene advises? Is my example an exception or is it a bad idea to do this even for my example?

like image 358
alexf Avatar asked Mar 17 '23 10:03

alexf


1 Answers

The Lucene score values are, as you've covered, only relevant for expressing the relative strength each match within a set of matches. Out of the context of a particular set of search results, the score for a particular record has no absolute meaning.

For this reason, the only appropriate normalization of the scores would be one to normalize the relationships between relevancy of documents within a result set, and even then you'll want to be very careful about how you employ this information.

Consider this result set, where we examine the score of each record as compared to the immediately preceding result:

ProductA         (Let's pretend the score is 10)
ProductB:  97%   (9.7)
ProductC:   8.5% (.82)
ProductD: 100%   (.82)
ProductE: 100%   (.82)
ProductF:  24%   (.2)

In this case, the first two results have very similar scores, while the next three have the same score but trail significantly. These numbers are clearly not to be shared with the shoppers online, but the low relative scores at ProductC and ProductF represent sharp enough drops that you could use them to inform other display options. Maybe ProductA and ProductB get displayed in a larger font than the others. If only one product appears before a precipitous drop, it could get even more special highlighting.

I would caution against completely suppressing relatively lower scored results in this kind of search. As you've already proven in your example, relative scores may be misleading, and unless your relevancy is very finely tuned the most relevant documents may not always be the most appropriate. It will do you no good if the desired results are dropped due to a single record that happens to repeat the search terms enough times to win a stellar score, and this is a real threat.

For example, "Hamilton Beach Three-In-One Convection Toaster Oven" will match one in eight words against a search for toaster, while "ToastMaster Toast Toaster Toasting Machine TOASTER" will match as many as five in seven words depending on how you index. (Both product names are completely made up, but I wanted the second one to look less reputable.)

Also, all returned documents are matches, no matter how low their scores might be. Sometimes a low-ranked result is the dark-horse find that the user really wants. Users will not understand that there are matching documents beyond what they see unless you tell them, so you might hide the trailing results on "page 2", or behind a cut, but you probably don't want to block them. Letting the user understand the size of their result set can also help them decide how to fine-tune their search. Using the significant drops in score as thresholds for paging could be very interesting, but probably a challenging implementation.

like image 184
frances Avatar answered Mar 26 '23 21:03

frances