For example, I want to match a string consisting of m
to n
Chinese characters, then I can use:
[single Chinese character regular expression]{m,n}
Is there some regular expression of a single Chinese character, which could be any Chinese characters that exists?
UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.
Short answer: yes.
Simplified Chinese in the Solaris 8 environment provides three locales: zh, zh. UTF-8, and zh. GBK.
Literal Characters and Sequences For instance, you might need to search for a dollar sign ("$") as part of a price list, or in a computer program as part of a variable name. Since the dollar sign is a metacharacter which means "end of line" in regex, you must escape it with a backslash to use it literally.
The regex to match a Chinese (well, CJK) character is
\p{script=Han}
which can be appreviated to simply
\p{Han}
This assumes that your regex compiler meets requirement RL1.2 Properties from UTS#18 Unicode Regular Expressions. Perl and Java 7 both meet that spec, but many others do not.
In Java,
\p{InCJK_UNIFIED_IDEOGRAPHS}{1,3}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With