Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List of "tokens" on Lucene 3

Tags:

token

lucene

I'm new to Lucene, i started learning the version 3 branch and there's one thing i don't understand (obviously because i'm not experienced in the subject).

In Lucene 2.9, if i wanted a list of tokens i would create an ArrayList of Token class, ArrayList for example. That's pretty intuitive for me and the concept of token is very clear.

Now that the use of Token class is disencouraged in favour of the Attribute based API, do i have to create my own class to encapsulate the attributes i want? If yes, isn't that almost recreating the Lucene's Token class?

I'm doing a class to test analyzers, and having a list of resulting tokens makes it easier to test, i guess.

Any help would be appreciated ;) Thank you!

like image 779
Fabio Avatar asked Oct 12 '10 16:10

Fabio


2 Answers

According to the Token Javadoc, "Even though it is not necessary to use Token anymore, with the new TokenStream API it can be used as convenience class that implements all Attributes, which is especially useful to easily switch from the old to the new TokenStream API."

I suggest you keep using a Token. It matches the description above.

like image 105
Yuval F Avatar answered Sep 25 '22 22:09

Yuval F


Use the TermAttribute class:

TokenStream stream = analyzer.tokenStream("field", "text");
TermAttribute termAttr = stream.getAttribute(TermAttribute.class);
while (stream.incrementToken()) {
    String token = termAttr.term();
}
like image 45
Fred Foo Avatar answered Sep 24 '22 22:09

Fred Foo