I'd like to make a regex in Perl that will test a string for characters in a particular script. This would be something like:
$text =~ .*P{'Chinese'}.*
Is there a simple way of doing this, for English it's pretty easy by just testing for [a-zA-Z], but for a script like Chinese, or one of the Japanese scripts, I can't figure out any way of doing this short of writing out every character explicitly, which would make for some very ugly code. Ideas? I can't be the first/only person that's wanted to do this.
To check if a string contains at least one letter using regex, you can use the [a-zA-Z] regular expression sequence in JavaScript. The [a-zA-Z] sequence is to match all the small letters from a-z and also the capital letters from A-Z . This should be inside a square bracket to define it as a range.
[A-Za-z] will match all the alphabets (both lowercase and uppercase).
In regular expressions, we can match any character using period "." character. To match multiple characters or a given set of characters, we should use character classes.
Look at perldoc perluniprops, which provides an exhaustive list of properties you can use with \p
. You’ll be interested in \p{CJK_Unified_Ideographs}
and related properties such as \p{CJK_Symbols_And_Punctuation}
. \p{Hiragana}
and \p{Katakana}
give you the kana. There is also a \p{Script=...}
property for a number of scripts: \p{Han}
and \p{Script=Han}
match Han characters (Chinese), but there is no corresponding \p{Script=Japanese}
, quite simply because Japanese has multiple scripts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With