I need to use the following regular expression to validate some Asian characters
$regexp = "/^[\-'\u2e80-\u9fff\sa-zA-Z.]+$/"; // with warning $regexp = "/^[\-'\sa-zA-Z.]+$/"; // without warning
preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X.
Do you know how to change the regular expression pattern so that I can validate the Asian characters from \u2e80-\u9fff
I am using the latest XAMPP
Apache/2.2.14 (Win32) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l mod_autoindex_color PHP/5.3.1 mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1
The P is Python identifier for a named capture group. You will see P in regex used in jdango and other python based regex implementations.
The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API.
\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \. html?\ ' .
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
PCRE does not support the \uXXXX
syntax. Use \x{XXXX}
instead. See here.
Your \u2e80-\u9fff
range is also equivalent to
\p{InCJK_Radicals_Supplement}\p{InKangxi_Radicals}\p{InIdeographic_Description_Characters}\p{InCJK_Symbols_and_Punctuation}\p{InHiragana}\p{InKatakana}\p{InBopomofo}\p{InHangul_Compatibility_Jamo}\p{InKanbun}\p{InBopomofo_Extended}\p{InKatakana_Phonetic_Extensions}\p{InEnclosed_CJK_Letters_and_Months}\p{InCJK_Compatibility}\p{InCJK_Unified_Ideographs_Extension_A}\p{InYijing_Hexagram_Symbols}\p{InCJK_Unified_Ideographs}
Don't forget to add the u
modifier (/regex here/u
) if you're dealing with UTF-8. If you're dealing with another multi-byte encoding, you must first convert it to UTF-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With