do you know if there is a library in C# or a dictionary that could help me to translate Hiragana to Kanji? I know that there is the IME of Windows but I would like to customize entirely the design of the candidate list of Kanji for a given Hiragana and it is not possible with this IME.
Exemple : the user writes "toru", first it is translated in Hiragana : "とる" I would like to have this list of choice:
撮る 取る 盗る
Thanks!
Japanese Input Basics By default, the text will display as Hiragana. When you've finished typing a word (or words), you can press space to convert the Hiragana to Kanji.
Hiragana is a script, which in normal Japanese texts is used alongside kanji and katakana. Hiragana is usually used for grammatical functions (e.g., particles, verb inflections, etc.) and for words for which there doesn't exist kanji, or for words whose kanji are non-standard. 海山 would usually be written in kanji.
Japanese has three main sets of characters: Hiragana – a phonetic set of characters unique to Japanese. Katakana – another phonetic set of characters unique to Japanese, but used primarily for “loanwords”, or words borrowed from other languages. Kanji – Chinese “picture” characters adapted to Japanese.
Read more here. The Japanese writing system consists of two types of characters: the syllabic kana – hiragana (平仮名) and katakana (片仮名) – and kanji (漢字), the adopted Chinese characters. Each have different usages, purposes and characteristics and all are necessary in Japanese writing.
Unfortunatelly I do not know of a c# library. All I found involves importing some native libraries, like in this OS thread: Japanese to Romaji with Kakasi
If you are willing to do so, perhaps JWPce might help.
Although this is implemented as a Japanese text editor, it also contains a dictionary function (it actually contains a multitude of character lookup systems) that do what you want to do.
Possibly you can compile the project and then import those lookup functionality? JPWce is licensed under GPL and you can download both a binary executable and source code directly available from the homepage.
[Edit]
Researching some more I stumbled over mozc at Google Code:
Mozc is a Japanese Input Method Editor (IME) designed for multi-platform such as Chromium OS, Windows, Mac and Linux. This open-source project originates from Google Japanese Input.
(BSD license)
I have not looked into it myself yet, but it might be more what you are looking for as it does not have a full application "around it" but instead is intended to be used a library. Just like you wanted.
They also link to a short video how the input looks like: http://www.google.co.jp/ime/
Unfortunatelly, this still is C++, not .NET but it might be a starting point.
Microsoft publishes this as a separate product, called Visual Studio International Pack
http://visualstudiogallery.msdn.microsoft.com/74609641-70BD-4A18-8550-97441850A7A8
I do not know a C# library either. But given that a dictionary might be sufficient, you may want to look into using the IME dictionary that comes with Anthy.
If you download the sources of the most recent version, you'll find dictionary sources in the mkworddic
and alt-cannadic
directories. Look at the various files ending in .t
.
Note that they are encoded in EUC-JP; you might want to convert them to UTF-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With