Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr/Lucene Scorer

Tags:

solr

lucene

We are currently working on a proof-of-concept for a client using Solr and have been able to configure all the features they want except the scoring.

Problem is that they want scores that make results fall in buckets:

  • Bucket 1: exact match on category (score = 4)
  • Bucket 2: exact match on name (score = 3)
  • Bucket 3: partial match on category (score = 2)
  • Bucket 4: partial match on name (score = 1)

First thing we did was develop a custom similarity class that would return the correct score depending on the field and an exact or partial match.

The only problem now is that when a document matches on both the category and name the scores are added together.

Example: searching for "restaurant" returns documents in the category restaurant that also have the word restaurant in their name and thus get a score of 5 (4+1) but they should only get 4.

I assume for this to work we would need to develop a custom Scorer class but we have no clue on how to incorporate this in Solr. Another option is to create a custom SortField implementation similar to the RandomSortField already present in Solr.

Maybe there is even a simpler solution that we don't know about.

All suggestions welcome!

like image 255
TFor Avatar asked Jun 14 '10 08:06

TFor


2 Answers

Scorer are parts of lucene Queries via the 'weight' query method.

In short, the framework calls Query.weight(..).scorer(..) . Have a look at

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Query.html

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Weight.html

http://lucene.apache.org/jva/2_4_0/api/org/apache/lucene/search/Scorer.html

To use your own Query class in Solr, you'll need to implement your own solr QueryParserPlugin that uses your own QParser that generates your previously implemented lucene Query. You then can use it in Solr specified here:

http://wiki.apache.org/solr/SolrPlugins#QParserPlugin

This part on implementation should stay simple as this is just some glueing code.

Enjoy hacking Solr!

like image 79
jeje Avatar answered Nov 27 '22 19:11

jeje


You can override the logic solr scorer uses. Solr uses DefaultSimilarity class for scoring.

  • Make a class extending DefaultSimilarity and override the functions tf(), idf() etc according to your need:
    public class CustomSimilarity extends DefaultSimilarity {
    
      public CustomSimilarity() {
        super();
      }
    
      public float tf(int freq) {
        //your code  
        return (float) 1.0;
      }
    
      public float idf(int docFreq, int numDocs) {
        //your code
        return (float) 1.0;
      }
    
    }
    

  • After creating the class compile and make a jar.
  • Put the jar in lib folder of corresponding index or core.
  • Change the schema.xml of corresponding index: <similarity class="<your package name>.CustomSimilarity"/>
  • You can check out various factors affecting score here

    For your requirement you can create buckets if your score is in specific range. Also read about field boosting, document boosting etc. That might be helpful in your case.

    like image 27
    ameykpatil Avatar answered Nov 27 '22 19:11

    ameykpatil