Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr Relevancy - How to A/B Test for Search Quality?

Tags:

testing

solr

I am looking to perform live A/B and controlled side-by-side experiments to help understand how changes affect search quality. I will be testing variables such as boost value and fuzzyqueries.

What other metrics are used to determine whether users prefer A vs B? Here are 2 metrics I found online...

  • In Google Analytics, “% Search Exits” is a metric you can use to measure the quality of your site-search results

  • Another way to measure search quality is to measure the number of search result pages the visitor views.

like image 723
phpboy Avatar asked Aug 22 '11 01:08

phpboy


People also ask

What is relevance in SOLR?

Relevance is the degree to which a query response satisfies a user who is searching for information. The relevance of a query response depends on the context in which the query was performed. A single search application may be used in different contexts by users with different needs and expectations.


1 Answers

Search Quality is something not easily measurable. For measuring relevance you need to have couple of things:

  1. A competitor to measure relevance. For your case the different instance of your search engine will be the competitors for each other. I mean one search engine instance would have the basic algorithm running, the other with fuzzy enabled, another with both fuzzy and boosting and so on.

  2. You need to manually rate the results. You can ask your colleagues to rate query/url pairs for popular queries and then for the holes(i.e. query/url pair not rated you can have some dynamic ranking function by using "Learning to Rank" Algorithm http://en.wikipedia.org/wiki/Learning_to_rank. Dont be surprised by that but thats true (please read below of an example of Google/Bing).

Google and Bing are competitors in the horizontal search market. These search engines employ manual judges around the world and invest millions on them, to rate their results for queries. So for each query/url pairs generally top 3 or top 5 results are rated. Based on these ratings they may use a metric like NDCG (Normalized Discounted Cumulative Gain) , which is one of finest metric and the one of most popular one.

According to wikipedia:

Discounted cumulative gain (DCG) is a measure of effectiveness of a Web search engine >algorithm or related applications, often used in information retrieval. Using a graded >relevance scale of documents in a search engine result set, DCG measures the usefulness, >or gain, of a document based on its position in the result list. The gain is accumulated >from the top of the result list to the bottom with the gain of each result discounted at >lower ranks.

Wikipedia explains NDCG in a great manner. It is a short article, please go through that.

As you have mentioned you can also have click through rate/data where in you have kind of wisdom of crowd Algorithm and you tweak the relevance based on that. It is a very good way out but it attracts spamming. So it has to be coupled with some metric such as NDCG/MAP etc. to solve your relevance problem.

I can provide more details on this if you still need to know more on how whole stuff put together would work in your case study.

like image 195
Yavar Avatar answered Sep 18 '22 19:09

Yavar