Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What lucene analyzer can be used to handle Japanese text?

Which lucene analyzer can be used to handle Japanese text properly? It should be able to handle Kanji, Hiragana, Katakana, Romaji, and any of their combination.

like image 891
Franz See Avatar asked Oct 26 '09 14:10

Franz See


2 Answers

You should probably look at the CJK package that is in the contrib area of Lucene. There is an analyzer and a tokenizer specifically for dealing with Chinese, Japanese, and Korean.

like image 183
adrianbanks Avatar answered Oct 20 '22 02:10

adrianbanks


I found lucene-gosen while doing a search for my own purposes:

Their example looks fairly decent, but I guess it's the kind of thing that needs extensive testing. I'm also worried about their backwards-compatibility policy (or rather, the complete lack of one.)

like image 40
Hakanai Avatar answered Oct 20 '22 01:10

Hakanai