Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr Custom Similarity

i want to set my own custom similarity in my solr schema.xml but i have a few problems with understanding this feature. I want to completely deactivate solr scoring (tf,idf,coord and fieldNorm).

I dont know where to start. Things i know

  1. I have to write my own DefaultSimilarity implementation.
  2. Override the (tf,idf,coord and fieldNorm) - methods.
  3. Load the class in schem.xml

Where to store the class ? Are there any working examples in the web ? I cant find one!

THANKS

like image 987
Stillmatic1985 Avatar asked Dec 06 '13 16:12

Stillmatic1985


People also ask

Is it possible to compile a new class for Solr?

You have scoured the net for possible solutions , even pestered the nerds on the #solr IRC channels. After exhausting all the possiblities you realise you are going to have to compile a new simialrity class for Solr and tweak it to your needs.

How does Lucene’s score for similarities work?

We’re looking at one term at a time (all similarities are doing this) and, the more often the term appears in our field, the higher the score. Lucene actually takes the square root of the TF: if you query for cat, a document mentioning cat twice is more likely about cats, but maybe not twice as likely as a document with only one occurrence.

How do you test the quality of similarity results?

Whether you use a default or a custom-built similarity, you’ll need a way to test the quality of results in some sort of test suite. In essence, you’d define relevance judgements (queries and expected results) and measure how close your actual results are from the expected ones.

Which similarity model is the best?

Which similarity is the best? As you might expect, there are no hard and clear guidelines on which similarity is better. There are some rules of thumb, though. For example: A good approach is to figure out which factors are important for your use-case (and in which proportions), then try out the model (s) that seem to fit best.


1 Answers

The implementation of the Similarity changed in solr 8.0.

Here an example how to do it since solr 8.

public class CustomSimilarityFactory extends SchemaSimilarityFactory {

    @Override
    public Similarity getSimilarity() {
        return new CustomSimilarity();
    }

}

public class CustomSimilarity extends Similarity{

    private final SimScorer customSimScorer = new CustomSimScorer();

    @Override
    public long computeNorm(FieldInvertState fis) {
        return 1L;
    }

    @Override
    public SimScorer scorer(float f, CollectionStatistics cs, TermStatistics... tss) {
        return customSimScorer;
    }

}

public class CustomSimScorer extends SimScorer {

    @Override
    public float score(float f, long l) {
        return 1f;
    }

}

Add your lib to the solrconfig.xml <lib dir="/yourCustomDir/" regex=".*\.jar"/> and your custom similarity to your schema.xml <similarity class="com.christoph.solr.CustomSimilarityFactory"></similarity>

like image 56
Christoph Avatar answered Oct 05 '22 01:10

Christoph