Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene .net Boost not working when using * wildcard

I have two documents and using Luke to investigate, I have confirmed in code that it has the same behavior, using StandardAnalyzer.

Document one with boost 1

stored/uncompressed,indexed,tokenized<Description:Nummer ett>
stored/uncompressed,indexed,tokenized<Id:2>
stored/uncompressed,indexed,tokenized<Name:Apa>

Document two with boost 2

stored/uncompressed,indexed,tokenized<Description:Nummer två>
stored/uncompressed,indexed,tokenized<Id:1>
stored/uncompressed,indexed,tokenized<Name:Apa>

Search apa in field Name Returns with boost used and in the correct order.

Document 2 has Score 1,1891
Document 1 has Score 0.5945

Search ap* Returns in no order and same score

Document 1 Score 1.0000
Document 2 Score 1.0000

Search apa* Returns in no order and same score

Document 1 Score 1.0000
Document 2 Score 1.0000

Why is this? I would like to return some documents with higher boost value even if I have to use wildcards. Is this possible?

Cheers all cool coders out there!

This is what I want to accomplice.

A search string and want matches. Using wildcard. Search "Lu" +"*"

Document
 Name
 City

I would like the Document whose Name is Lund to get higher rating than the document with the Name Lunt or City is Lund for example. This is due to I will know which documents that are most popular. I want to get the documents with city Stockholm and names Stockholm and Stockholmen but ordered as I choose.

like image 874
JustusTh Avatar asked Apr 27 '12 14:04

JustusTh


1 Answers

Since WildcardQuery is a subclass of MultiTermQuery you are getting constant score of 1.

If you check the definition of t.getBoost():

t.getBoost() is a search time boost of term t in the query q as specified in the query text (see query syntax), or as set by application calls to setBoost(). Notice that there is really no direct API for accessing a boost of one term in a multi term query, but rather multi terms are represented in a query as multi TermQuery objects, and so the boost of a term in the query is accessible by calling the sub-query getBoost()

http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/search/Similarity.html#formula_termBoost

One possible hack could be to set rewrite method of query parser:

myCustomQueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
like image 80
ZeNo Avatar answered Nov 03 '22 06:11

ZeNo