Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding exact match using Lucene search API

I'm working on a company search API using Lucene. My Lucene company index has got 2 companies: 1.Abigail Adams National Bancorp, Inc. 2.National Bancorp

If the user types in National Bancorp, then only company # 2(ie. National Bancorp) should be returned and not #1.....ie. only exact matches should be returned. How do I achieve this functionality?

Thanks for reading.

like image 921
Steve Chapman Avatar asked Jun 10 '09 18:06

Steve Chapman


3 Answers

You can use KeywordAnalyzer to index and search on this field. Keyword Analyzer will generate only one token for the entire string.

like image 76
Shashikant Kore Avatar answered Nov 02 '22 04:11

Shashikant Kore


I googled a lot with no help for the same problem. After scratching my head for a while I found the solution. Search the string within double quotes, that will solve your problem.

National Bancorp will return both #1 and #2 but "National Bancorp" will return only #2.

like image 40
Somonath Sabat Avatar answered Nov 02 '22 02:11

Somonath Sabat


This is something that may warrant the use of the shingle filter. This filter groups multiple words together. For example, Abigail Adams National Bancorp with a ShingleFilter of 3 tokens would produce (assuming a simple WhitespaceAnalyzer) [Abigail], [Abigail Adams], [Abigail Adams National], [Adams National Bancorp], [Adams National], [Adams], [National], [National Bancorp] and [Bancorp].

If a user the queries for National Bancorp, you will get an exact match on National Bancorp itself, and a lower scored exact match on Abigail Adams National Bancorp (lower scored because this one has much more tokens in the field, thus lowering the idf). I think it makes sense to return both documents on such a query.

You may want to apply the shingle filter at query time as well, depending on the use case.

like image 20
wesen Avatar answered Nov 02 '22 03:11

wesen