I'm new to Lucene, i started learning the version 3 branch and there's one thing i don't understand (obviously because i'm not experienced in the subject).
In Lucene 2.9, if i wanted a list of tokens i would create an ArrayList of Token class, ArrayList for example. That's pretty intuitive for me and the concept of token is very clear.
Now that the use of Token class is disencouraged in favour of the Attribute based API, do i have to create my own class to encapsulate the attributes i want? If yes, isn't that almost recreating the Lucene's Token class?
I'm doing a class to test analyzers, and having a list of resulting tokens makes it easier to test, i guess.
Any help would be appreciated ;) Thank you!
According to the Token Javadoc, "Even though it is not necessary to use Token anymore, with the new TokenStream API it can be used as convenience class that implements all Attributes, which is especially useful to easily switch from the old to the new TokenStream API."
I suggest you keep using a Token. It matches the description above.
Use the TermAttribute
class:
TokenStream stream = analyzer.tokenStream("field", "text");
TermAttribute termAttr = stream.getAttribute(TermAttribute.class);
while (stream.incrementToken()) {
String token = termAttr.term();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With