Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Practice of Field Collapsing in SOLR 1.4

Tags:

solr

I need a way to collapse duplicate (defined in terms of a string field with an id) results in solr. I know that such a feature is comming in the next version (1.5), but I can't wait for that. What would be the best way to remove duplicates using the current stable version 1.4?

Given that finding duplicates in my case is really easy (comparison of a string field), should it be a Filter, should I overwrite the existing SearchComponent or write a new Component, or use some external libraries like carrot2?

The overall result count should reflect the shortened result.

like image 791
Dominik Avatar asked Apr 08 '10 06:04

Dominik


1 Answers

Well, there is a solution: just apply the collapse field patch (see http://issues.apache.org/jira/browse/SOLR-236 for the latest news about this feature, i also recommend you http://blog.jteam.nl/author/martijn).

Doing this you will get working the CollapseComponent . Notice that there is a searching performance degradation associated with this feature.

like image 156
Lici Avatar answered Nov 14 '22 02:11

Lici