I would like to implement relevance feedback in Solr. Solr already has a More Like This feature: Given a single document, return a set of similar documents ranked by similarity to the single input document. Is it possible to configure Solr's More Like This feature to behave like More Like Those? In other words: Given a set of documents, return a list of documents similar to the input set (ranked by similarity). According to the answer to this question turning Solr's More Like This into More Like Those can be done in the following way: <ol> <li>Take the url of the result set of the query returning the specified documents. For example, the url <code>http://solrServer:8983/solr/select?q=id:1%20id:2%20id:3</code> returns the response to the query <code>id:1 id:2 id:3</code> which is practically the concatenation of documents 1, 2, 3.</li> <li>Put the above url (concatenation of the specified documents) in the <code>url.stream</code> GET parameter of the More Like This handler: <code>http://solrServer:8983/solr/mlt?mlt.fl=text&mlt.mintf=0&stream.url=http://solrServer:8983/solr/select%3Fq=id:1%20id:2%20id:3</code>. Now the More Like This handler treats the concatenation of documents 1, 2 and 3 as a single input document and returns a ranked set of documents similar to the concatenation.</li> </ol> This is a pretty bad implementation: Treating the set of input documents like one big document discriminates against short documents because short documents occupy a small portion of the entire big document. Solr's More Like This feature is implemented by a variation of The Rocchio Algorithm: It takes the top 20 terms of the (single) input document (the terms with the highest TF-IDF values) and uses those terms as the modified query, boosted according to their TF-IDF. I am looking for a way to configure Solr's More Like This feature to take multiple documents as its input, extract the top n terms from each input document and query the index with those terms boosted according to their TF-IDF. Is it possible to configure More Like This to behave that way? If not, what is the best way to implement relevance feedback in Solr?

Unfortunately, it is not possible to configure the MLT handler that way. One way to do it would be to implement a custom SearchComponent and register it to a (dedicated) SearchHadler. I've already done something similar and it is quite easy if you look a the original implementation of MLT component. The most difficult part is the synchronization of the results from different shard servers, but it can be skipped if you do not use shards. I would also strongly recommend to use your own parameters in your implementation to prevent collisions with other components.

Relevance feedback in Apache Solr

Tags:

solr

lucene

information-retrieval

I would like to implement relevance feedback in Solr. Solr already has a More Like This feature: Given a single document, return a set of similar documents ranked by similarity to the single input document. Is it possible to configure Solr's More Like This feature to behave like More Like Those? In other words: Given a set of documents, return a list of documents similar to the input set (ranked by similarity).

According to the answer to this question turning Solr's More Like This into More Like Those can be done in the following way:

Take the url of the result set of the query returning the specified documents. For example, the url http://solrServer:8983/solr/select?q=id:1%20id:2%20id:3 returns the response to the query id:1 id:2 id:3 which is practically the concatenation of documents 1, 2, 3.
Put the above url (concatenation of the specified documents) in the url.stream GET parameter of the More Like This handler: http://solrServer:8983/solr/mlt?mlt.fl=text&mlt.mintf=0&stream.url=http://solrServer:8983/solr/select%3Fq=id:1%20id:2%20id:3. Now the More Like This handler treats the concatenation of documents 1, 2 and 3 as a single input document and returns a ranked set of documents similar to the concatenation.

This is a pretty bad implementation: Treating the set of input documents like one big document discriminates against short documents because short documents occupy a small portion of the entire big document.

Solr's More Like This feature is implemented by a variation of The Rocchio Algorithm: It takes the top 20 terms of the (single) input document (the terms with the highest TF-IDF values) and uses those terms as the modified query, boosted according to their TF-IDF. I am looking for a way to configure Solr's More Like This feature to take multiple documents as its input, extract the top n terms from each input document and query the index with those terms boosted according to their TF-IDF.

Is it possible to configure More Like This to behave that way? If not, what is the best way to implement relevance feedback in Solr?

996

asked Jun 09 '13 08:06

snakile

1 Answers

Unfortunately, it is not possible to configure the MLT handler that way.

One way to do it would be to implement a custom SearchComponent and register it to a (dedicated) SearchHadler.

I've already done something similar and it is quite easy if you look a the original implementation of MLT component.

The most difficult part is the synchronization of the results from different shard servers, but it can be skipped if you do not use shards.

I would also strongly recommend to use your own parameters in your implementation to prevent collisions with other components.

111

answered Oct 21 '22 10:10

Roman K

Related questions
                            
                                Is it possible to do Solr faceting combining multiple fields, like distinct on multiple columns in RMDB?
                            
                                Elasticsearch UI [closed]
                            
                                Cassandra or SOLR? What gives better performance to frond end read queries?
                            
                                Running Solr on Azure
                            
                                Document search on partial words
                            
                                Changing the default operator from OR to AND in Solr (Magento Enterprise)
                            
                                SOLR df and qf explanation
                            
                                Solr Fuzzy Search for similar words
                            
                                Solr always use more than 90% of physical memory
                            
                                Delete/remove Solr configuration from ZooKeeper using zkcli?
                            
                                LockObtainFailedException updating Lucene search index using solr
                            
                                solr - java heap space out of memory
                            
                                Can you use POST to run a query in Solr (/select)
                            
                                solr main query vs fq
                            
                                apache solr as a service hosting [closed]
                            
                                How can I Schedule data imports in Solr
                            
                                Forward Index vs Inverted index Why?
                            
                                How can I upload a file to Solr in Windows?
                            
                                SOLR - Best approach to import 20 million documents from csv file
                            
                                Can't reindex Sunspot SOLR - Error - RSolr::Error::Http - 500 Internal Server Error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With