I've found places on the web such as http://www.chinesetopinyin.com that convert Chinese characters to pinyin (romanization).
Does anyone know how to do this, or have a database that can be parsed?
EDIT: I'm using C# but would actually prefer a database/flatfile.
After you set up the Pinyin - Traditional input source, you can enter Traditional Chinese characters using Pinyin phonetic input codes.
possible solution using Python:
I think that Unicode database contains pinyin romanizations for chinese characters, but these are not included in unicodedata
module data.
however, you can use some external libraries, like cjklib, example:
# coding: UTF-8
import cjklib
from cjklib.characterlookup import CharacterLookup
c = u'好'
cjk = CharacterLookup('T')
readings = cjk.getReadingForCharacter(c, 'Pinyin')
for r in readings:
print r
output:
hāo
hǎo
hào
UPDATE
cjklib comes with an standalone cjknife
utility, which micht help. some usage is described here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With