Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using stop words with WhitespaceAnalyzer

Lucene's StandardAnalyzer removes dots from string/acronyms when indexing it. I want Lucene to retain dots and hence I'm using WhitespaceAnalyzer class.

I can give my list of stop words to StandardAnalyzer...but how do i give it to WhitespaceAnalyzer?

Thanks for reading.

like image 217
Steve Chapman Avatar asked Mar 08 '26 09:03

Steve Chapman


1 Answers

Create your own analyzer by extending WhiteSpaceAnalyzer and override tokenStream method as follows.

public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = super.tokenStream(fieldName, reader);
    result = new StopFilter(result, stopSet);
    return result;
}

Here the stopSet is the Set of stop words, which you could get by adding a constructor to your analyzer which accepts a list of stop words.

You may also wish to override reusableTokenStream() method in similar fashion if you plan to reuse the TokenStream.

like image 52
Shashikant Kore Avatar answered Mar 10 '26 06:03

Shashikant Kore



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!