Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect if PCRE was built without the --enable-unicode-properties or --enable-utf8 configuration switches

Tags:

php

utf-8

pcre

I've a PHP library that uses a number of regular expressions featuring the \P expressions for multibyte strings, e.g.

((((?:\P{M}\p{M}*)+?)|(\'[^\']*\')|(\"[^\"]*\"))!)?\$?([a-z]{1,3})\$?(\d+)

While this works on most builds, I've had a few reports of the regexp returning an error.

Depending on Operating platform, the error messages from PCRE are:

Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at offset n

or

Compilation failed: support for \P, \p, and \X has not been compiled at offset n

I know that I can probably test a regexp at the beginning of my code that uses \P, and trap for a returned error, then use that response to set a compatibility flag and provide a degraded (non UTF-8) regexp without the \P within the main body of my code based on that compatibility flag.

I was wondering if there was any simpler way to identify whether PCRE had been built without the --enable-unicode-properties or --enable-utf8 configuration switches. PHP provides access to PCRE_VERSION constant, but that won't help identify whether \P support is enabled or not.

like image 827
Mark Baker Avatar asked Dec 22 '10 13:12

Mark Baker


2 Answers

Other than trying it, I think the only way is to use the pcretest command line tool, with the -C option (compile-time options):

bash-4.1.5$ pcretest -C
   No UTF-8 support
   No Unicode properties support
   Newline sequence is LF
   \R matches all Unicode newlines
   Internal link size = 2
   POSIX malloc threshold = 10
   Default match limit = 10000000
   Default recursion depth limit = 10000000
   Match recursion uses stack
like image 170
netcoder Avatar answered Nov 16 '22 23:11

netcoder


While comments suggest checking for PREG_BAD_UTF8_ERROR the PHP source http://lxr.php.net/xref/PHP_5_6/ext/pcre/php_pcre.c#141 suggests this constant is always available if PCRE is. Indeed it seems --enable-unicode-properties is a PCRE lib switch and is simply not exposed by PHP. The only thing I can imagine is running a simple regexp once with warning supressed...

like image 24
chx Avatar answered Nov 16 '22 21:11

chx