I'm trying to search a UTF8-encoded string using preg_match.
preg_match('/H/u', "\xC2\xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE); echo $a_matches[0][1];
This should print 1, since "H" is at index 1 in the string "¡Hola!". But it prints 2. So it seems like it's not treating the subject as a UTF8-encoded string, even though I'm passing the "u" modifier in the regular expression.
I have the following settings in my php.ini, and other UTF8 functions are working:
mbstring.func_overload = 7 mbstring.language = Neutral mbstring.internal_encoding = UTF-8 mbstring.http_input = pass mbstring.http_output = pass mbstring.encoding_translation = Off
Any ideas?
Definition and Usage The preg_match() function returns whether a match was found in a string.
Return Values ¶ preg_match() returns 1 if the pattern matches given subject , 0 if it does not, or false on failure. This function may return Boolean false , but may also return a non-Boolean value which evaluates to false . Please read the section on Booleans for more information.
preg_match is case sensitive. A match. Add the letter "i" to the end of the pattern string to perform a case-insensitive match.
Although the u modifier makes both the pattern and subject be interpreted as UTF-8, the captured offsets are still counted in bytes.
You can use mb_strlen
to get the length in UTF-8 characters rather than bytes:
$str = "\xC2\xA1Hola!"; preg_match('/H/u', $str, $a_matches, PREG_OFFSET_CAPTURE); echo mb_strlen(substr($str, 0, $a_matches[0][1]));
Try adding this (*UTF8) before the regex:
preg_match('(*UTF8)/H/u', "\xC2\xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE);
Magic, thanks to a comment in https://www.php.net/manual/function.preg-match.php#95828
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With