Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene case sensitive & insensitive search

Tags:

java

lucene

I have a Lucene index which is currently case sensitive. I want to add the option of having a case insensitive search as a fall-back. This means that results that match the case will get more weight and will appear first. For example, if the number of results is limited to 10, and there are 10 matches which match my case, this is enough. If I only found 7 results, I can add 3 more results from the case-insensitive search.

My case is actually more complex, since I have items with different weights. Ideally, having a match with "wrong" case will add some weight. Needless to say, I do not want duplicate results.

One possible approach is to have 2 indexes. One with case and one without and search both. Naturally, there's some redundancy here, since I need to index twice.

Is there a better solution? Ideas?

like image 980
zvikico Avatar asked Mar 21 '10 16:03

zvikico


2 Answers

The Lucene search is case sensitive, it's just that all input is usually lower-cased upon passing through Queryparser , so it feels like it's case insensitive. In other words, don't lower-case your input before indexing, and don't lower-case your queries (i.e. pick an Analyzer that doesn't lower-case) keyword-analyzer for example.

[setLowercaseExpandedTerms][1](boolean lowercaseExpandedTerms)

you can index the terms using case sensitive analyzer and when u want case-insensitive query use a class which doesnot convert your terms to lowercase

look at Wildcard, Prefix, and Fuzzy queries

like image 20
Narayan Avatar answered Nov 09 '22 12:11

Narayan


Did you already tried copyField? see http://wiki.apache.org/solr/SchemaXml#Copy_Fields

If not define a new field B with a different configuration and copy field A into B via copyField

like image 132
Karussell Avatar answered Nov 09 '22 13:11

Karussell