Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Lucene, why do my boosted and unboosted documents get the same score?

At index time I am boosting certain document in this way:

if (myCondition)  
{
   document.SetBoost(1.2f);
}

But at search time documents with all the exact same qualities but some passing and some failing myCondition all end up having the same score.

And here is the search code:

BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.Add(new TermQuery(new Term(FieldNames.HAS_PHOTO, "y")), BooleanClause.Occur.MUST);
booleanQuery.Add(new TermQuery(new Term(FieldNames.AUTHOR_TYPE, AuthorTypes.BLOGGER)), BooleanClause.Occur.MUST_NOT);
indexSearcher.Search(booleanQuery, 10);

Can you tell me what I need to do to get the documents that were boosted to get a higher score?

Many Thanks!

like image 458
Barka Avatar asked Oct 26 '11 05:10

Barka


People also ask

How does Lucene scoring work?

Lucene uses a combination of the Vector Space Model (VSM) and the Boolean model of information Retrieval to determine how relevant a document is to a user's query. It assigns a default score between 0 and 1 to all search results, depending on multiple factors related to document relevancy.

What is boosting in Lucene?

Score Boosting Lucene allows influencing search results by "boosting" in more than one level: Document level boosting - while indexing - by calling document. setBoost() before a document is added to the index.


1 Answers

Lucene encodes boosts on a single byte (although a float is generally encoded on four bytes) using the SmallFloat#floatToByte315 method. As a consequence, there can be a big loss in precision when converting back the byte to a float.

In your case SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.2f)) returns 1f because 1f and 1.2f are too close to each other. Try using a bigger boost so that your documents get different scores. (For exemple 1.25, SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.25f)) gives 1.25f.)

like image 143
jpountz Avatar answered Nov 01 '22 01:11

jpountz