Using Appache Lucene TokenStream to remove stopwords causes an error:
TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
I use this code:
public static String removeStopWords(String string) throws IOException {
TokenStream tokenStream = new StandardTokenizer(Version.LUCENE_47, new StringReader(string));
TokenFilter tokenFilter = new StandardFilter(Version.LUCENE_47, tokenStream);
TokenStream stopFilter = new StopFilter(Version.LUCENE_47, tokenFilter, StandardAnalyzer.STOP_WORDS_SET);
StringBuilder stringBuilder = new StringBuilder();
CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);
while(stopFilter.incrementToken()) {
if(stringBuilder.length() > 0 ) {
stringBuilder.append(" ");
}
stringBuilder.append(token.toString());
}
stopFilter.end();
stopFilter.close();
return stringBuilder.toString();
}
But as you can see i never call reset() or close().
So why am i getting this error?
i never call reset() or close().
Well, that is your problem. If you care to read TokenStream
javadoc, you would find the following:
The workflow of the new
TokenStream
API is as follows:
- Instantiation of
TokenStream
/TokenFilter
s which add/get attributes to/from theAttributeSource
.- The consumer calls
TokenStream#reset()
- ...
I only had to add one line with reset()
to your code and it worked.
...
CharTermAttribute token = tokenStream.getAttribute(CharTermAttribute.class);
tokenStream.reset(); // I added this
while(stopFilter.incrementToken()) {
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With