Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boosting Lucene Terms When Building the Index

Is it possible to determine that specific terms are more important then other when creating the index (not when querying it) ?

Consider for example a synonym filter:
doc 1: "this is a nice car"
doc 2: "this is a nice vehicle"

I want to add the term vehicle to the first doc and the term car to the second doc, but I want that if later the index is queried with the word car then the first document will be scored higher then the second one and if queried for vehicle it will be the other way around.

Will calling setBoost on the fields before adding them to their respective documents do the trick?

Or maybe I should add the synonyms to a different field name?

Or am I looking at this from a wrong point of view ?

Thanks

like image 445
epeleg Avatar asked Jan 16 '12 13:01

epeleg


1 Answers

Setting boost on a filed affects all terms in that field so this wouldn't work in your case.

But it should be posible using Lucene payloads (a byte array that can be set for every term). You would use them to set term specific boosts (vehicle to 0.5 for doc 1, for example). Then you'll implement your own Similarity and override scorePayload() method to decode that boost and then use PayloadTermQuery which allows you to contribute to the score based on the boots you have in the payload for that term.

like image 120
milan Avatar answered Oct 02 '22 16:10

milan