I've a company field in Lucene Index. One of the company names indexed is : Moody's
When user types in any of the following keywords,I want this company to come up in search results. 1.Moo 2.Mood 3.Moodys 4.Moody's
How should I store this index in Lucene and what type of Lucene Query should I use to get this behaviour?
Thanks.
Based on your clarifications, I want to divide your question into two, and answer each in turn:
1 is relatively easy - Use a StandardToeknizer to create a token combining the apostrophe and s with the previous word, then a StandardFilter to remove the apostrophe and s. This will convert Moody's to Moody. A StandardAnalyzer does this and much more (lowercasing and stop word removal), which may be more than you need. Using a stemmer should take both Moodys and Moody to the same token. Try SnowBallFilter for this.
2 is harder: Lucene's PrefixQuery, to which Alan alluded, will only work when the company name is the first word in a field. You need something like the answer to this question about auto-complete in Lucene.
The StandardAnalyser should work for 3 and 4, however won't work for 1 and 2.
Without writing your own (complex) text analyser, I would think about how you're expecting company names to be searched for. For example, basic lucene search syntax means that you could find "Moody's" if you search using wildcards: "Moo*" and "Mood*". Therefore, you might want to consider appending an "*" to the search term before submitting to lucene, however this might cause some confusion if the user isn't aware of this wildcard addition under the hood.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With