I'm working on a company search API using Lucene. My Lucene company index has got 2 companies: 1.Abigail Adams National Bancorp, Inc. 2.National Bancorp
If the user types in National Bancorp, then only company # 2(ie. National Bancorp) should be returned and not #1.....ie. only exact matches should be returned. How do I achieve this functionality?
Thanks for reading.
You can use KeywordAnalyzer to index and search on this field. Keyword Analyzer will generate only one token for the entire string.
I googled a lot with no help for the same problem. After scratching my head for a while I found the solution. Search the string within double quotes, that will solve your problem.
National Bancorp will return both #1 and #2 but "National Bancorp" will return only #2.
This is something that may warrant the use of the shingle filter. This filter groups multiple words together. For example, Abigail Adams National Bancorp with a ShingleFilter of 3 tokens would produce (assuming a simple WhitespaceAnalyzer) [Abigail], [Abigail Adams], [Abigail Adams National], [Adams National Bancorp], [Adams National], [Adams], [National], [National Bancorp] and [Bancorp].
If a user the queries for National Bancorp, you will get an exact match on National Bancorp itself, and a lower scored exact match on Abigail Adams National Bancorp (lower scored because this one has much more tokens in the field, thus lowering the idf). I think it makes sense to return both documents on such a query.
You may want to apply the shingle filter at query time as well, depending on the use case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With