I have a problem with the score calculation with a PrefixQuery. To change score of each document, when add document into index, I have used setBoost to change the boost of the document. Then I create PrefixQuery to search, but the result have not been changed according to the boost. It seems setBoost totally doesn't work for a PrefixQuery. Please check my code below:
@Test
public void testNormsDocBoost() throws Exception {
Directory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.LIMITED);
Document doc1 = new Document();
Field f1 = new Field("contents", "common1", Field.Store.YES, Field.Index.ANALYZED);
doc1.add(f1);
doc1.setBoost(100);
writer.addDocument(doc1);
Document doc2 = new Document();
Field f2 = new Field("contents", "common2", Field.Store.YES, Field.Index.ANALYZED);
doc2.add(f2);
doc2.setBoost(200);
writer.addDocument(doc2);
Document doc3 = new Document();
Field f3 = new Field("contents", "common3", Field.Store.YES, Field.Index.ANALYZED);
doc3.add(f3);
doc3.setBoost(300);
writer.addDocument(doc3);
writer.close();
IndexReader reader = IndexReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs docs = searcher.search(new PrefixQuery(new Term("contents", "common")), 10);
for (ScoreDoc doc : docs.scoreDocs) {
System.out.println("docid : " + doc.doc + " score : " + doc.score + " "
+ searcher.doc(doc.doc).get("contents"));
}
}
The output is :
docid : 0 score : 1.0 common1
docid : 1 score : 1.0 common2
docid : 2 score : 1.0 common3
Lucene uses a combination of the Vector Space Model (VSM) and the Boolean model of information Retrieval to determine how relevant a document is to a user's query. It assigns a default score between 0 and 1 to all search results, depending on multiple factors related to document relevancy.
Score Boosting Lucene allows influencing search results by "boosting" in more than one level: Document level boosting - while indexing - by calling document. setBoost() before a document is added to the index.
Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.
By default, PrefixQuery rewrites the query to use ConstantScoreQuery, which gives every single matching document a score of 1.0. I think this is to make PrefixQuery faster. So your boosts are getting ignored.
If you want the boosts to take effect in your PrefixQuery, you need to call setRewriteMethod(), using the SCORING_BOOLEAN_QUERY_REWRITE constant on your prefix query instance. See http://lucene.apache.org/java/2_9_1/api/all/index.html .
For debugging, you can use searcher.explain().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With