Bug in PHP Preg engine: look-around Unicode issue

Tags:

php

Why following js code words:

"آرد@".replace(/(?=.)/g,'!'); // returns: ""!آ!ر!د""

But its php equivalent returns '!�!�!�!�!�!�'?

preg_replace('/(?=.)/u', '!', 'آرد'); //returns '!�!�!�!�!�!�'

This works only in 4.3.5 - 5.0.5, 5.1.1 - 5.1.6 versions.

See: http://3v4l.org/jrV0W

674

asked Feb 18 '13 07:02

1 Answers

If you simply add the /u modifier, the pattern is supposed to be treated as utf-8. The second example works because:

Since PHP 5.1, you can use \p{L} that can be translated as: "is any kind of letter from any language."
In addition to the standard notation, \p{L}, Java, Perl, PCRE and now PHP allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.

UPDATE: Why preg_replace('/(?=.)/u', '!', 'آرد'); //returns '!�!�!�!�!�!�'??

As @MarkFox says, the reason is because in the context of preg_replace() it assumes one byte per character and the characters you're "RegExing" are multibyte. That's why your replace output has double the matches you'd expect, it's matching each byte of each character (which I infer to be two bytes) -

No matter what you do with your document encoding, you will need to use Unicode character properties to get this working.

What about that weird symbol?

When you see that "weird square symbol with a question mark inside" otherwise known as the REPLACEMENT CHARACTER, that is usually an indicator that you have a byte in the range of 80-FF (128-255) and the system is trying to render it in UTF-8.

That entire byte-range is invalid for single-byte characters in UTF-8, but are all very common in western encodings such as ISO-8859-1.

answered Sep 27 '22 23:09

Tom Sarduy

Related questions
                            
                                How to measure the network bandwidth used between client and server?
                            
                                run ffmpeg from PHP web script
                            
                                Magento - Extending Topmenu.php block stops the topmenu.phtml template loading
                            
                                PHP requests with multiple query strings
                            
                                CodeIgniter HMVC extends MX_Controller, unable to use get_instance properly
                            
                                Using UUID in CakePHP, What DataType Is Recommended?
                            
                                the correct MIME Type for JSON? [duplicate]
                            
                                How to close curly brackets in PHP [closed]
                            
                                exclude private property from print_r or object?
                            
                                getting pid of spawned exec in phing
                            
                                Wordpress site release management strategy
                            
                                How to manage Doctrine queries with multiple db schemas
                            
                                PHP form validation function
                            
                                Getting similar longitude and latitude from database
                            
                                Load a PHP page's div on another page on other domain
                            
                                PHP infinite loop prevents access to other scripts?
                            
                                HTML/CSS Visualizing a RBAC Graph
                            
                                How do I authorize a google drive service account access to a google account without using google apps?
                            
                                Does declaring an unnecessary variable in PHP consumes memory?
                            
                                Strip hidden character from string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bug in PHP Preg engine: look-around Unicode issue

Tags:

regex

php

Handsome Nerd

People also ask

1 Answers

Tom Sarduy

Recent Activity

Donate For Us