Why following js code words:
"آرد@".replace(/(?=.)/g,'!'); // returns: ""!آ!ر!د""
But its php equivalent returns '!�!�!�!�!�!�'
?
preg_replace('/(?=.)/u', '!', 'آرد'); //returns '!�!�!�!�!�!�'
This works only in 4.3.5 - 5.0.5, 5.1.1 - 5.1.6 versions.
See: http://3v4l.org/jrV0W
Definition and Usage The preg_match() function returns whether a match was found in a string.
preg_match() returns 1 if the pattern matches given subject , 0 if it does not, or false on failure. This function may return Boolean false , but may also return a non-Boolean value which evaluates to false .
You can use the PHP strcmp() function to easily compare two strings. This function takes two strings str1 and str2 as parameters. The strcmp() function returns < 0 if str1 is less than str2 ; returns > 0 if str1 is greater than str2 , and 0 if they are equal.
The preg_match() function returns true if pattern matches otherwise, it returns false.
If you simply add the /u
modifier, the pattern is supposed to be treated as utf-8
. The second example works because:
\p{L}
that can be translated as: "is any kind of letter from any language."
\pL
. The shorthand only works with single-letter Unicode properties.UPDATE: Why preg_replace('/(?=.)/u', '!', 'آرد'); //returns '!�!�!�!�!�!�'??
As @MarkFox says, the reason is because in the context of preg_replace()
it assumes one byte per character and the characters you're "RegExing" are multibyte. That's why your replace output has double the matches you'd expect, it's matching each byte of each character (which I infer to be two bytes) -
No matter what you do with your document encoding, you will need to use Unicode character properties to get this working.
What about that weird symbol?
When you see that "weird square symbol with a question mark inside" otherwise known as the REPLACEMENT CHARACTER, that is usually an indicator that you have a byte in the range of 80-FF (128-255) and the system is trying to render it in UTF-8
.
That entire byte-range is invalid for single-byte characters in UTF-8
, but are all very common in western encodings such as ISO-8859-1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With