Is there a way to know whether a Unicode string contains any Chinese/Japanese character in Python?

Question

I have a Unicode string in Python. I am looking for a way to determine if there is any Chinese/Japanese character in the string. If possible it'll be better to be able to locate those characters.

It seems this is a bit different from a language detection problem. My string can be a mixture of English and Chinese texts.

My code has Internet access.

nneonneo · Accepted Answer

You can use the Unicode Script property to determine what script they are commonly associated with.

Python's unicodedata module, sadly, does not have this property. However, a number of third-party modules, such as unicodedata2 and unicodescript do have this information. You can query them and check to see if you have any characters in the Han script, which corresponds to Chinese (and Kanji, and Hanja).

Is there a way to know whether a Unicode string contains any Chinese/Japanese character in Python?

Tags:

python

Dr. Alpha

1 Answers

nneonneo

Recent Activity

Donate For Us

Is there a way to know whether a Unicode string contains any Chinese/Japanese character in Python?

Tags:

python

Dr. Alpha

1 Answers

nneonneo

Related questions

Recent Activity

Donate For Us