I'm wondering is there any method to check a Chinese character is simplified Chinese or traditional Chinese in Python 3?
The most obvious difference between traditional Chinese and simplified Chinese is the way that the characters look. Traditional characters are typically more complicated and have more strokes, while simplified characters are, as the name suggests, simpler and have fewer strokes.
Optical character recognition (OCR) – Many apps and websites provide OCR features where you can scan or take pictures of the character(s) you want to look up. Google Docs has such a feature and there are others online you can easily find by searching for “Chinese” and “OCR”.
The default encoding for Python 3 source code is UTF-8, and the language's str type contains Unicode characters, meaning any string created using “unicode rocks!”, 'unicode rocks! ', or the triple-quoted string syntax is stored as Unicode [6].
cjklib
does not support Python 3. In Python 3, you can use hanzidentifier
.
import hanzidentifier
print(hanzidentifier.has_chinese('Hello my name is John.'))
》 False
print(hanzidentifier.has_chinese('Country in Simplified: 国家. Country in Traditional: 國家.'))
》 True
print(hanzidentifier.is_simplified('John说:你好!'))
》 True
print(hanzidentifier.is_traditional('John說:你好!'))
》 True
You can use getCharacterVariants()
in cjklib
to query the character's simplified (S
) and traditional (T
) variants. As described in the Unihan database documentation, you can use this data to determine the classification for a character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With