I want to build my own analyzer that uses both filters/tokenizers.
I mean, the same field is Keyword (entire stream as a single token) and lowercase
If KeywordAnalyzer use only, the value of field keeps the case-insensitive. If I use LowerCaseTokenizer or LowerCaseFilter I have to combine them with other analyzers that do the same thing KeywordAnalyzer (separated by no letter, by spaces, remove stop-words, etc.)
The question is: Is there any way to make that field as Keyword (entire stream as a single token) and that lowercase using filters or analyzers Lucene or tokenizers?
(google translated, sorry about errors)
This should work:
public final class YourAnalyzer extends ReusableAnalyzerBase {
@Override
protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) {
final TokenStream source = new KeywordTokenizer(reader);
return new TokenStreamComponents(source, new LowercaseFilter(Version.LUCENE_36, source));
}
}
In Lucene 3.6.2 it must look like this:
import org.apache.lucene.analysis.KeywordAnalyzer;
import org.apache.lucene.analysis.KeywordTokenizer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.LowerCaseTokenizer;
import org.apache.lucene.analysis.ReusableAnalyzerBase;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.util.Version;
public class YourAnalyzer extends ReusableAnalyzerBase {
private final Version version;
public YourAnalyzer(final Version version) {
super();
this.version = version;
}
@Override
protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) {
final Tokenizer source = new KeywordTokenizer(reader);
return new TokenStreamComponents(source, new LowerCaseFilter(this.version, source));
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With