Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Analyzer for Russian language in Lucene and Lucene.Net

Tags:

lucene

Lucene has quite poor support for Russian language.

RussianAnalyzer (part of lucene-contrib) is of very low quality.

RussianStemmer module for Snowball is even worse. It does not recognize Russian text in Unicode strings, apparently assuming that some bizarre mix of Unicode and KOI8-R must be used instead.

Do you know any better solutions?

like image 737
Misha Avatar asked Sep 15 '08 15:09

Misha


2 Answers

My answer is probably too late, but for the record, I've found analyzers from AOT project much better then those shipped with Lucene.

like image 140
spariev Avatar answered Oct 02 '22 17:10

spariev


I used http://code.google.com/p/russianmorphology/

like image 41
pushistic Avatar answered Oct 02 '22 17:10

pushistic