Lucene's StandardAnalyzer removes dots from string/acronyms when indexing it. I want Lucene to retain dots and hence I'm using WhitespaceAnalyzer class.
I can give my list of stop words to StandardAnalyzer...but how do i give it to WhitespaceAnalyzer?
Thanks for reading.
Create your own analyzer by extending WhiteSpaceAnalyzer and override tokenStream method as follows.
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = super.tokenStream(fieldName, reader);
result = new StopFilter(result, stopSet);
return result;
}
Here the stopSet is the Set of stop words, which you could get by adding a constructor to your analyzer which accepts a list of stop words.
You may also wish to override reusableTokenStream() method in similar fashion if you plan to reuse the TokenStream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With