Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignore tf/idf at query time in Solr

Tags:

solr

lucene

I am trying to boost particular documents based on a field value. It is generally working ok but some documents return a higher score even though they have a smaller boost value.

After debugging the query with the debugQuery=on request parameter I have noticed that the idf function is returning a higher score for a particular document, which is affecting the overall score.

Is there a way to ignore tf/idf scoring at query time?

like image 342
C0deAttack Avatar asked Dec 11 '12 17:12

C0deAttack


People also ask

Does SOLR use TF-IDF?

Lucene's default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores. Solr has now exposed these factors as function queries.

What is TF-IDF in SOLR?

Term frequency-inverse document frequency (TF-IDF) term vectors are often used to represent text documents when performing text mining and machine learning operations. The math expressions library can be used to perform text analysis and create TF-IDF term vectors.

What is the difference between TF and TF-IDF?

TF-IDF (term frequency-inverse document frequency) is an information retrieval technique that helps find the most relevant documents corresponding to a given query. TF is a measure of how often a phrase appears in a document, and IDF is about how important that phrase is.

How do I get a TF-IDF query?

Only tf(life) depends on the query itself. However, the idf of a query depends on the background documents, so idf(life) = 1+ ln(3/2) ~= 1.405507153. That is why tf-idf is defined as multiplying a local component (term frequency) with a global component (inverse document frequency).


1 Answers

You'll want to create a custom Similarity which overrides the tf and idf methods, and use it in place of the DefaultSimilarity.

Something like:

class CustomSimilarity extends DefaultSimilarity {

    @Override
    public float tf(float freq) {
        return 1.0;
    }

    @Override
    public float tf(int freq) {
        return 1.0;
    }

    @Override
    // Note the signature of this method may now take longs:
    //   public float idf(long docFreq, long numDocs)
    public float idf(int docFreq, int numDocs) {
        return 1.0;
    }
}

The set it to use that similarity in your schema.xml:

<similarity class="myorg.mypackage.CustomSimilarity"/>
like image 89
femtoRgon Avatar answered Sep 28 '22 00:09

femtoRgon