Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene: Score calculation with a PrefixQuery

I have a problem with the score calculation with a PrefixQuery. To change score of each document, when add document into index, I have used setBoost to change the boost of the document. Then I create PrefixQuery to search, but the result have not been changed according to the boost. It seems setBoost totally doesn't work for a PrefixQuery. Please check my code below:

 @Test
 public void testNormsDocBoost() throws Exception {
    Directory dir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true,
            IndexWriter.MaxFieldLength.LIMITED);
    Document doc1 = new Document();
    Field f1 = new Field("contents", "common1", Field.Store.YES, Field.Index.ANALYZED);
    doc1.add(f1);
    doc1.setBoost(100);
    writer.addDocument(doc1);
    Document doc2 = new Document();
    Field f2 = new Field("contents", "common2", Field.Store.YES, Field.Index.ANALYZED);
    doc2.add(f2);
    doc2.setBoost(200);
    writer.addDocument(doc2);
    Document doc3 = new Document();
    Field f3 = new Field("contents", "common3", Field.Store.YES, Field.Index.ANALYZED);
    doc3.add(f3);
    doc3.setBoost(300);
    writer.addDocument(doc3);
    writer.close();

    IndexReader reader = IndexReader.open(dir);
    IndexSearcher searcher = new IndexSearcher(reader);

    TopDocs docs = searcher.search(new PrefixQuery(new Term("contents", "common")), 10);
    for (ScoreDoc doc : docs.scoreDocs) {
        System.out.println("docid : " + doc.doc + " score : " + doc.score + " "
                + searcher.doc(doc.doc).get("contents"));
    }
} 

The output is :

 docid : 0 score : 1.0 common1
 docid : 1 score : 1.0 common2
 docid : 2 score : 1.0 common3
like image 302
Keven Avatar asked Jun 17 '10 10:06

Keven


People also ask

How is Lucene score calculated?

Lucene uses a combination of the Vector Space Model (VSM) and the Boolean model of information Retrieval to determine how relevant a document is to a user's query. It assigns a default score between 0 and 1 to all search results, depending on multiple factors related to document relevancy.

What is boosting in Lucene?

Score Boosting Lucene allows influencing search results by "boosting" in more than one level: Document level boosting - while indexing - by calling document. setBoost() before a document is added to the index.

How Lucene work?

Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.


1 Answers

By default, PrefixQuery rewrites the query to use ConstantScoreQuery, which gives every single matching document a score of 1.0. I think this is to make PrefixQuery faster. So your boosts are getting ignored.

If you want the boosts to take effect in your PrefixQuery, you need to call setRewriteMethod(), using the SCORING_BOOLEAN_QUERY_REWRITE constant on your prefix query instance. See http://lucene.apache.org/java/2_9_1/api/all/index.html .

For debugging, you can use searcher.explain().

like image 131
bajafresh4life Avatar answered Oct 20 '22 22:10

bajafresh4life