How to get a Token from a Lucene TokenStream?

Question

I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream.

The worst part is that I'm looking at the comments in the JavaDocs that address my question.

http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29

Somehow, an AttributeSource is supposed to be used, rather than Tokens. I'm totally at a loss.

Can anyone explain how to get token-like information from a TokenStream?

Adam Paynter · Accepted Answer

Yeah, it's a little convoluted (compared to the good ol' way), but this should do it:

TokenStream tokenStream = analyzer.tokenStream(fieldName, reader); OffsetAttribute offsetAttribute = tokenStream.getAttribute(OffsetAttribute.class); TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class);  while (tokenStream.incrementToken()) {     int startOffset = offsetAttribute.startOffset();     int endOffset = offsetAttribute.endOffset();     String term = termAttribute.term(); }

Edit: The new way

According to Donotello, TermAttribute has been deprecated in favor of CharTermAttribute. According to jpountz (and Lucene's documentation), addAttribute is more desirable than getAttribute.

TokenStream tokenStream = analyzer.tokenStream(fieldName, reader); OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class); CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);  tokenStream.reset(); while (tokenStream.incrementToken()) {     int startOffset = offsetAttribute.startOffset();     int endOffset = offsetAttribute.endOffset();     String term = charTermAttribute.toString(); }

How to get a Token from a Lucene TokenStream?

Tags:

java

attributes

token

lucene

tokenize

Eric Wilson

1 Answers

Edit: The new way

Adam Paynter

Recent Activity

Donate For Us

How to get a Token from a Lucene TokenStream?

Tags:

java

attributes

token

lucene

tokenize

Eric Wilson

1 Answers

Edit: The new way

Adam Paynter

Related questions

Recent Activity

Donate For Us