preg_match and UTF-8 in PHP

Tags:

I'm trying to search a UTF8-encoded string using preg_match.

preg_match('/H/u', "\xC2\xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE); echo $a_matches[0][1];

This should print 1, since "H" is at index 1 in the string "¡Hola!". But it prints 2. So it seems like it's not treating the subject as a UTF8-encoded string, even though I'm passing the "u" modifier in the regular expression.

I have the following settings in my php.ini, and other UTF8 functions are working:

mbstring.func_overload = 7 mbstring.language = Neutral mbstring.internal_encoding = UTF-8 mbstring.http_input = pass mbstring.http_output = pass mbstring.encoding_translation = Off

Any ideas?

500

asked Nov 12 '09 20:11

JW.

2 Answers

Although the u modifier makes both the pattern and subject be interpreted as UTF-8, the captured offsets are still counted in bytes.

You can use mb_strlen to get the length in UTF-8 characters rather than bytes:

$str = "\xC2\xA1Hola!"; preg_match('/H/u', $str, $a_matches, PREG_OFFSET_CAPTURE); echo mb_strlen(substr($str, 0, $a_matches[0][1]));

answered Sep 21 '22 23:09

Gumbo

Try adding this (*UTF8) before the regex:

preg_match('(*UTF8)/H/u', "\xC2\xA1Hola!", $a_matches, PREG_OFFSET_CAPTURE);

Magic, thanks to a comment in https://www.php.net/manual/function.preg-match.php#95828

answered Sep 25 '22 23:09

Natxet

Related questions
                            
                                Adding attributes to customer entity
                            
                                Fatal error: Call to undefined function pg_connect()
                            
                                Multiple index variables in PHP foreach loop
                            
                                PHP: simplest way to get the date of the month 6 months prior on the first?
                            
                                How to change PHP version on MAMP 4.1
                            
                                How to upload files in Laravel directly into public folder?
                            
                                capturing echo into a variable
                            
                                php hide ALL errors
                            
                                Keeping array index key when sorting a multidimensional array with PHP
                            
                                How to set the default controller in yii2
                            
                                Check string length in PHP
                            
                                How to disable output buffering in PHP
                            
                                Php put a space in front of capitals in a string (Regex)
                            
                                Could a Malicious Hacker Alter a Hidden Post Variable
                            
                                How can I get IDE autocomplete for PHPUnit?
                            
                                PHP Composer behind http proxy
                            
                                Calculate elapsed time in php
                            
                                Laravel Carbon Data Missing
                            
                                Warning: PDO::__construct(): [2002] No such file or directory (trying to connect via unix:///tmp/mysql.sock) in
                            
                                UTF-8 problems while reading CSV file with fgetcsv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

preg_match and UTF-8 in PHP

Tags:

php

unicode

utf-8

pcre

JW.

People also ask

2 Answers

Gumbo

Natxet

Recent Activity

Donate For Us