Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping query statistics using lucene

Tags:

java

lucene

I am developing a search component of a web application using Lucene. I would like to save the user queries to an index and use them to suggest alternate queries to users, and to keep query statistics (most often used queries, top scoring queries, ...).

To use this data for alternate query suggestions, I would analyze the queries to see which terms are most often used with one another and use that to create a suggestion to the user.

But I can't figure out in which form to index the data. I was thinking of simply adding the queries into the index, but in that way there could be a lot of redundant data since many documents in the index would have the same content. Does anyone have any ideas about the way this can be accomplished?

Thanks for the help.

like image 465
jbradaric Avatar asked Nov 25 '10 14:11

jbradaric


2 Answers

"I was thinking of simply adding the queries into the index, but in that way there could be a lot of redundant data since many documents in the index would have the same content"

You can tell Lucene not to store document content, which means that the principal overhead will be the unique Terms, and the index itself. So, it might not be a large overhead to store each query as a unique Document...this way you will not be throwing away any information.

like image 81
Joel Avatar answered Nov 08 '22 20:11

Joel


First, I believe that you should store the queries separately from the existing index. The problem is not redundant data but rather "watering down" your index - storing the queries in the same index may harm the relevance of your searches. Some options for this are:

  • Use a separate Lucene index.
  • Use Solr, with two separate cores, one for the documents and the other for the queries.
  • Use a query log. Store scores with the queries. Build query statistics using post-processing.As this is a web application, you can probably use a servlet container, such as Tomcat's, logs for this.

Second, Auto-Suggest From Popular Queries Using EdgeNGrams suggests an alternative implementation of query suggestion using Solr.

like image 31
Yuval F Avatar answered Nov 08 '22 19:11

Yuval F