I want to implement a phonetic search using Lucene 6.1.0., using Soundex or any suitable algorithm for Portuguese. I found many incomplete examples over internet, teaching how to implement a custom tokenizer, analyzer, but it seems that the abstract classes used on those exapmples are not the same in the version 6.1.0. Can anyone point me out where I can find a good documentation an Lucene, not just java docs without any further documentation teaching how to put the things together?
Thanks in advance.
The Analyzer documentation shows how to create your analyzer.
For phonetic analysis, you should look to the org.apache.lucene.analysis.phonetic package (You'll need to add "lucene-analyzers-phonetic-6.1.0.jar" to your build path, as well as Apache's "commons-codec-1.10.jar", which you can get here).
Then you can setup your analyzer something like, for instance:
Analyzer analyzer = new Analyzer() {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new StandardTokenizer();
TokenStream stream = new DoubleMetaphoneFilter(tokenizer, 6, false);
return new TokenStreamComponents(tokenizer, stream);
}
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With