We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring.
Problem is that they want scores that make results fall in buckets:
First thing we did was develop a custom similarity class that would return the correct score depending on the field and an exact or partial match.
The only problem now is that when a document matches on both the category and name the scores are added together.
Example: searching for "restaurant" returns documents in the category restaurant that also have the word restaurant in their name and thus get a score of 5 (4+1) but they should only get 4.
I assume for this to work we would need to develop a custom Scorer class but we have no clue on how to incorporate this in Solr. Another option is to create a custom SortField implementation similar to the RandomSortField already present in Solr.
Maybe there is even a simpler solution that we don't know about.
All suggestions welcome!
Scorer are parts of lucene Queries via the 'weight' query method.
In short, the framework calls Query.weight(..).scorer(..) . Have a look at
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Query.html
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Weight.html
http://lucene.apache.org/jva/2_4_0/api/org/apache/lucene/search/Scorer.html
To use your own Query class in Solr, you'll need to implement your own solr QueryParserPlugin that uses your own QParser that generates your previously implemented lucene Query. You then can use it in Solr specified here:
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin
This part on implementation should stay simple as this is just some glueing code.
Enjoy hacking Solr!
You can override the logic solr scorer uses. Solr uses DefaultSimilarity class for scoring.
public class CustomSimilarity extends DefaultSimilarity {
public CustomSimilarity() {
super();
}
public float tf(int freq) {
//your code
return (float) 1.0;
}
public float idf(int docFreq, int numDocs) {
//your code
return (float) 1.0;
}
}
<similarity class="<your package name>.CustomSimilarity"/>
You can check out various factors affecting score here
For your requirement you can create buckets if your score is in specific range. Also read about field boosting, document boosting etc. That might be helpful in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With