Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene insensitive whitespace analyzer?

I am using lucene for searching and with tags i use the whitespace analyzer. It looks like its stored properly. With standard analyzer my 'C#' search will yield results for C, C++. Every analyzer i tried (i havent tried all) does this except for whitespace analyzer. This is fine except if i search c# i get no results (i'm using a lowercase C instead of uppercase). This is annoying if i search a title such as "Lucene insensitive whitespace analyzer?" when it happens to be "Lucene Insensitive Whitespace analyzer?". (Note the first 3 words start with upper and the last doesnt compared to my search with one upper and all lower).

How do i make an insensitive whitespace analyzer? Note: WhitespaceAnalyzer is sealed.


1 Answers

Try using LowerCaseFilter in conjunction with WhitespaceTokenizer:

http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/LowerCaseFilter.html

http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/analysis/WhitespaceTokenizer.html

like image 79
bajafresh4life Avatar answered Dec 21 '25 10:12

bajafresh4life



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!