Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preg_match and UTF-8 in PHP

I'm trying to search a UTF8-encoded string using preg_match.

preg_match('/H/u', "\xC2\xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE); echo $a_matches[0][1]; 

This should print 1, since "H" is at index 1 in the string "¡Hola!". But it prints 2. So it seems like it's not treating the subject as a UTF8-encoded string, even though I'm passing the "u" modifier in the regular expression.

I have the following settings in my php.ini, and other UTF8 functions are working:

mbstring.func_overload = 7 mbstring.language = Neutral mbstring.internal_encoding = UTF-8 mbstring.http_input = pass mbstring.http_output = pass mbstring.encoding_translation = Off 

Any ideas?

like image 500
JW. Avatar asked Nov 12 '09 20:11

JW.


People also ask

What does Preg_match mean in PHP?

Definition and Usage The preg_match() function returns whether a match was found in a string.

What value is return by Preg_match?

Return Values ¶ preg_match() returns 1 if the pattern matches given subject , 0 if it does not, or false on failure. This function may return Boolean false , but may also return a non-Boolean value which evaluates to false . Please read the section on Booleans for more information.

Is Preg_match case sensitive?

preg_match is case sensitive. A match. Add the letter "i" to the end of the pattern string to perform a case-insensitive match.


2 Answers

Although the u modifier makes both the pattern and subject be interpreted as UTF-8, the captured offsets are still counted in bytes.

You can use mb_strlen to get the length in UTF-8 characters rather than bytes:

$str = "\xC2\xA1Hola!"; preg_match('/H/u', $str, $a_matches, PREG_OFFSET_CAPTURE); echo mb_strlen(substr($str, 0, $a_matches[0][1])); 
like image 63
Gumbo Avatar answered Sep 21 '22 23:09

Gumbo


Try adding this (*UTF8) before the regex:

preg_match('(*UTF8)/H/u', "\xC2\xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE); 

Magic, thanks to a comment in https://www.php.net/manual/function.preg-match.php#95828

like image 34
Natxet Avatar answered Sep 25 '22 23:09

Natxet