Indexing token bigrams in Lucene

Question

I need to index bi-grams of words (tokens) in Lucene. I can produce n-grams and than index them, but I am wondering if there is something in Lucene which will do this for me. I found out that Lucene indexes only n-gram of chars. Any ideas?

bajafresh4life · Accepted Answer

Use the NGramTokenizer:

http://lucene.apache.org/java/2_3_2/api/contrib-analyzers/org/apache/lucene/analysis/ngram/NGramTokenizer.html

DerHeiligste · Answer

The class that you are looking for is the ShingleFilter: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/shingle/ShingleFilter.html

Indexing token bigrams in Lucene

Tags:

indexing

lucene

Ilija

2 Answers

bajafresh4life

DerHeiligste

Recent Activity

Donate For Us

Indexing token bigrams in Lucene

Tags:

indexing

lucene

Ilija

2 Answers

bajafresh4life

DerHeiligste

Related questions

Recent Activity

Donate For Us