How Can I Run a Regex that Tests Text for Characters in a Particular Alphabet or Script?

Tags:

perl

I'd like to make a regex in Perl that will test a string for characters in a particular script. This would be something like:

$text =~ .*P{'Chinese'}.*

Is there a simple way of doing this, for English it's pretty easy by just testing for [a-zA-Z], but for a script like Chinese, or one of the Japanese scripts, I can't figure out any way of doing this short of writing out every character explicitly, which would make for some very ugly code. Ideas? I can't be the first/only person that's wanted to do this.

940

asked Nov 30 '11 22:11

Eli

1 Answers

Look at perldoc perluniprops, which provides an exhaustive list of properties you can use with \p. You’ll be interested in \p{CJK_Unified_Ideographs} and related properties such as \p{CJK_Symbols_And_Punctuation}. \p{Hiragana} and \p{Katakana} give you the kana. There is also a \p{Script=...} property for a number of scripts: \p{Han} and \p{Script=Han} match Han characters (Chinese), but there is no corresponding \p{Script=Japanese}, quite simply because Japanese has multiple scripts.

194

answered Sep 23 '22 01:09

Jon Purdy

Related questions
                            
                                Regular Expression to accept all Thai characters and English letters in python
                            
                                How to match the forward slash using regex
                            
                                python regex to replace all single word characters in string
                            
                                Regex: Capture one or more groups if exists (Java)
                            
                                regex - match pattern of alternating characters
                            
                                Determine if string is base64-encoded twice
                            
                                Javascript RegEx Remove Multiple words from string
                            
                                Javascript replace opening and closing brackets
                            
                                Convert string data into data frame
                            
                                When should I use a compiled Regex vs. interpreted?
                            
                                Java Pattern to match any sequence of characters except a given list
                            
                                Parsing HTML document: Regular expression or LINQ?
                            
                                Regular Expression to match both relative and absolute URLs
                            
                                Regex replacement with Emacs
                            
                                Does Python use NFAs for regular expression evaluation in the re module?
                            
                                How to use logical OR in SPARQL regex()?
                            
                                PO Box Validation
                            
                                301 Redirect to replace all spaces to hyphens
                            
                                What is a regex for Twitter-like names?
                            
                                Regular Expression to strip comments from Bash script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Can I Run a Regex that Tests Text for Characters in a Particular Alphabet or Script?

Tags:

regex

perl

Eli

People also ask

1 Answers

Jon Purdy

Recent Activity

Donate For Us