Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Chinese characters to Pinyin

Tags:

For sorting Chinese language text, I want to convert Chinese characters to Pinyin, properly separating each Chinese character and grouping successive characters together.

Can you please help me in this task by providing the logic or source code for doing this?

Please let me know if any open source or lib already present for this.

like image 494
Ashish Yadav Avatar asked Jan 27 '11 05:01

Ashish Yadav


People also ask

How do you get pinyin?

In the Google Play store, search for "Google Pinyin App". Install the app. Open the app and tap to "Enable" the input method. In the "Language & input" menu, enable Google Pinyin Input with the slider.


1 Answers

Short answer: you don't.

Long answer: There is no one-to-one mapping for 汉字 to 汉语拼音. Just some quick examples:

  • 把 can be "ba" in the third tone or fourth tone.
  • 了 can be "le" toneless or "liao" third tone.
  • 乐 can be "le" or "yue", both in the fourth tone.
  • 落 can be "luo", "la" or "lao", all in the fourth tone.

And so on. I have a beginners' book on this topic that has 207 examples. I stress that this is a beginners' book and is by no means complete. Each one has a page or two of examples of use and conditions under which you choose the appropriate pronunciation. It is not something that could be easily programmed (if at all).

And this doesn't even address the other slippery thing you want to deal with: the separation of characters into grouped words. The very notion of a word is a bit slippery in Chinese. (There's two terms that correspond, roughly to "word" in Chinese for example: 字 and 词. The first is the character, the second groups of characters that are put together into one concept. (I frequently get asked by Chinese speakers how many "words" I can read when they really mean "characters".) While in some cases the distinction is clear (the 词 "乌鸦", for example, is "crow" -- the two 字 must be together to express the idea properly and it would be incorrect to translate it as "black crow"), in others it is not so clear. What does "你好" translate to? Is it one word meaning, idiomatically, "hello"? Or is it two words translating literally to "you good"? Each of the characters involved stands alone or in groups with other words, but together they mean something entirely different from their individual meanings. Given this, how, precisely, do you plan to group the 汉语拼音 transliterations (which are difficult to impossible to get right in the first place!) into "words"?

like image 184
JUST MY correct OPINION Avatar answered Sep 20 '22 01:09

JUST MY correct OPINION