Lucene Search for japanese characters

Tags:

I have implemented lucene for my application and it works very well unless you have introduced something like japanese characters.

The problem is that if I have japanese string こんにちは、このバイネイです and I search with こ that is the first character than it works well whereas if I use more than one japanese character(こんにち)in search token search fails and there is no document found.

Are japanese characters supported in lucene? what are the settings to be done to get it working?

702

asked Apr 15 '10 07:04

Pranali Desai

2 Answers

Built-in analyzer of lucene does not support japanese.

You need to install some analyzer like sen, which is java port of mecab, quite popular japanese analyzer, and its fast.

There is 2 sub types called

CJKAnalyzer, which support chinese, and korean too, and using bi-gram method
JapaneseAnalyzer, which only support japanese, using Morphological Analyzer and supposed to be very fast.

127

answered Oct 17 '22 14:10

YOU

I don't think there can be an analyzer that will work for all languages. The problem is that different languages have different rules about word boundaries and stemming (for example, the Thai language doesn't use spaces at all to separate words). Or if there is, I certainly wouldn't want to be the maintainer!

What you will need to do is "tag" blocks of text as one language or another and use the correct analyzer for that particular language. You can attempt to detect the language "automatically" by doing character analysis (i.e. text using predominantly Japanese Katakana is likely Japanese)

answered Oct 17 '22 15:10

Dean Harding

Related questions
                            
                                Explicitly implemented interface and generic constraint
                            
                                How do I call a controller method from JQuery?
                            
                                WCF communication with several clients without IIS
                            
                                Best way to display read-only text in C#
                            
                                How can I get a NullReferenceException in this code sample?
                            
                                Should I mingle my safe code with my unsafe code?
                            
                                How to log in to a vbulletin forum with C#?
                            
                                C# - How to set a ComboBox selectedItem from specific value?
                            
                                Using conditionals in Linq Programmatically
                            
                                C# API to create webpage thumbnail [closed]
                            
                                Any tips of how to handle hierarchical trees in relational model?
                            
                                .NET and P2P - writing a P2P messenger
                            
                                Link seams in .NET
                            
                                Error converting datetime to string
                            
                                Navigation and WebBrowser control
                            
                                A simple augmented reality application in C#
                            
                                Entity Framework lazy loading doesn't work from other thread
                            
                                Exposing a C++ API to C#
                            
                                Which way is preferred when doing asynchronous WCF calls?
                            
                                How can I insert or remove bytes from the middle of a large file in .NET

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Lucene Search for japanese characters

Tags:

c#

asp.net

lucene.net

Pranali Desai

2 Answers

YOU

Dean Harding

Recent Activity

Donate For Us