Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index time field level boosting in Lucene 6.6.0?

In Lucene 6.6.0 and greater, Field level index time boosting is deprecated. The documentation states:

Index-time boosts are deprecated, please index index-time scoring factors into a doc value field and combine them with the score at query time using eg. FunctionScoreQuery.

Previously one would boost a field at index time like so:

    Field title = new Field(PaperDAO.LUCENE_FIELD_TITLE, titleStr, fieldType);
    title.setBoost(3.00f);
    document.add(title);

    Field authors = new Field(PaperDAO.LUCENE_FIELD_AUTHOR, StringEscapeUtils.unescapeHtml4(this.getAuthorsForLucene()), fieldType);
    authors.setBoost(10.00f);
    document.add(authors);

I do not understand how the suggested FunctionScoreQuery is an appropriate replacement for field level boosting, as one constructs a FunctionScoreQuery given only an existing Query and a DoubleValuesSource representing the boost value for only one of potentially many fields:

// INDEX TIME
Field title = new Field(PaperDAO.LUCENE_FIELD_TITLE, titleStr, fieldType);
document.add(title);
document.add(new FloatDocValuesField(PaperDAO.LUCENE_FIELD_TITLE + "_boost", 3.00f));

// QUERY TIME
new FunctionScoreQuery(query, DoubleValuesSource.fromFloatField(PaperDAO.LUCENE_FIELD_TITLE + "_boost"))

Can someone please explain the appropriate replacement for Field#setBoost @ index time in Lucene >= 6.6.0? Are we supposed to be enumerating all possible fields at query time and applying the relevant boost? If so, how is that query constructed?

like image 703
loopforever Avatar asked Aug 22 '17 15:08

loopforever


2 Answers

First of all, you still have some time to use old-style index time boosts, since they only will be remove in Lucene 7.0 :)

Moving on to the subject, community long time ago decided, that index-time boost is a complex and hard to get it right technique.

What I think is the current idea - is not to replace per-field index time boost with per field docvalues field, but rather replace all index time boosts for a document with 1 accumulated score in the docvalues field and later use it during search.

please index index-time scoring factors into a doc value field and combine them with the score at query time

The quote is from the javadoc, which only strengthen me in this idea. You could index multiple factors into just one field.

The open question to me is - how to combine several factors into 1. I hope that's something to test and validate (to use multiplication, sum or some linear combination)

like image 62
Mysterion Avatar answered Oct 23 '22 09:10

Mysterion


if you want to boost different fields using FunctionScoreQuery, the suggested method is the following (taken from CustomeScoreProvider):

For more complex custom scores, use the lucene-expressions library

   SimpleBindings bindings = new SimpleBindings();
   bindings.add("score", DoubleValuesSource.SCORES);
   bindings.add("boost1", DoubleValuesSource.fromIntField("myboostfield"));
   bindings.add("boost2", DoubleValuesSource.fromIntField("myotherboostfield"));
   Expression expr = JavascriptCompiler.compile("score * (boost1 + ln(boost2))");
   FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));
like image 1
Peter Avatar answered Oct 23 '22 10:10

Peter