Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement a phonetic search using Lucene?

Tags:

java

lucene

I want to implement a phonetic search using Lucene 6.1.0., using Soundex or any suitable algorithm for Portuguese. I found many incomplete examples over internet, teaching how to implement a custom tokenizer, analyzer, but it seems that the abstract classes used on those exapmples are not the same in the version 6.1.0. Can anyone point me out where I can find a good documentation an Lucene, not just java docs without any further documentation teaching how to put the things together?

Thanks in advance.

like image 222
Eduardo Lopes Avatar asked Sep 19 '25 12:09

Eduardo Lopes


1 Answers

The Analyzer documentation shows how to create your analyzer.

For phonetic analysis, you should look to the org.apache.lucene.analysis.phonetic package (You'll need to add "lucene-analyzers-phonetic-6.1.0.jar" to your build path, as well as Apache's "commons-codec-1.10.jar", which you can get here).

Then you can setup your analyzer something like, for instance:

Analyzer analyzer = new Analyzer() {
    @Override
    protected TokenStreamComponents createComponents(String fieldName) {
        Tokenizer tokenizer = new StandardTokenizer();
        TokenStream stream = new DoubleMetaphoneFilter(tokenizer, 6, false);
        return new TokenStreamComponents(tokenizer, stream);
    }
};
like image 116
femtoRgon Avatar answered Sep 22 '25 01:09

femtoRgon