I need a way to collapse duplicate (defined in terms of a string field with an id) results in solr. I know that such a feature is comming in the next version (1.5), but I can't wait for that. What would be the best way to remove duplicates using the current stable version 1.4?
Given that finding duplicates in my case is really easy (comparison of a string field), should it be a Filter, should I overwrite the existing SearchComponent or write a new Component, or use some external libraries like carrot2?
The overall result count should reflect the shortened result.
Well, there is a solution: just apply the collapse field patch (see http://issues.apache.org/jira/browse/SOLR-236 for the latest news about this feature, i also recommend you http://blog.jteam.nl/author/martijn).
Doing this you will get working the CollapseComponent . Notice that there is a searching performance degradation associated with this feature.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With