Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression - PCRE does not support \L, \l, \N, \P,

Tags:

regex

pcre

I need to use the following regular expression to validate some Asian characters

 $regexp = "/^[\-'\u2e80-\u9fff\sa-zA-Z.]+$/"; // with warning   $regexp = "/^[\-'\sa-zA-Z.]+$/";   // without warning 

preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X.

Do you know how to change the regular expression pattern so that I can validate the Asian characters from \u2e80-\u9fff

I am using the latest XAMPP

Apache/2.2.14 (Win32) DAV/2 mod_ssl/2.2.14 OpenSSL/0.9.8l mod_autoindex_color PHP/5.3.1 mod_apreq2-20090110/2.7.1 mod_perl/2.0.4 Perl/v5.10.1 
like image 678
q0987 Avatar asked Aug 21 '10 17:08

q0987


People also ask

What does \p mean in regex?

The P is Python identifier for a named capture group. You will see P in regex used in jdango and other python based regex implementations.

What PCRE Perl Compatible Regular Expressions matching does?

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API.

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \. html?\ ' .

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.


1 Answers

PCRE does not support the \uXXXX syntax. Use \x{XXXX} instead. See here.

Your \u2e80-\u9fff range is also equivalent to

\p{InCJK_Radicals_Supplement}\p{InKangxi_Radicals}\p{InIdeographic_Description_Characters}\p{InCJK_Symbols_and_Punctuation}\p{InHiragana}\p{InKatakana}\p{InBopomofo}\p{InHangul_Compatibility_Jamo}\p{InKanbun}\p{InBopomofo_Extended}\p{InKatakana_Phonetic_Extensions}\p{InEnclosed_CJK_Letters_and_Months}\p{InCJK_Compatibility}\p{InCJK_Unified_Ideographs_Extension_A}\p{InYijing_Hexagram_Symbols}\p{InCJK_Unified_Ideographs}

Don't forget to add the u modifier (/regex here/u) if you're dealing with UTF-8. If you're dealing with another multi-byte encoding, you must first convert it to UTF-8.

like image 132
Artefacto Avatar answered Sep 21 '22 06:09

Artefacto