Let's say I have an UTF-8
text like this:
âàêíóôõ <br> âàêíóôõ <br> âàêíóôõ
I want to replace <br>
with <br />
. Do I need to use mb_str_replace
or I can use str_replace
?
Consindering <
b
r
/
>
are all single byte char?
A null-terminated multibyte string (NTMBS), or "multibyte string", is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each character stored in the string may occupy more than one byte.
Multibyte Character Set (MBCS): A character set encoded with a variable number of bytes for each character. Many large character sets have been defined as multi-byte character sets in order to keep strict compatibility with the standards of the ASCII subset, the ISO and IEC 2022.
The wcstombs() function converts the wide-character string pointed to by string into the multibyte array pointed to by dest . The converted string begins in the initial shift state.
The term “multibyte character” is defined by ISO C to denote a byte sequence that encodes an ideogram, no matter what encoding scheme is employed. All multibyte characters are members of the “extended character set.” A regular single-byte character is just a special case of a multibyte character.
Since str_replace
is binary-safe and UTF-8 is a bijective encoding, you can use str_replace
, even if search string or replacement contains multi-byte characters, as long as all three parameters are encoded as UTF-8.
That's why there isn't an mb_str_replace
function in the first place.
If your encoding is not bijective - i.e. there are multiple representations of the same string, for example <
in UTF-7, which can be expressed both as '+ADw-'
and '<'
, you should convert all strings to the same (bijective) encoding, apply str_replace
, and then convert the strings to the target encoding.
Reference for manipulating UTF-8 strings safely in PHP. There is no hard-and-fast rule. Some native PHP string functions functions can operate safely on utf-8, some can with care, and some cannot.
There is no mb_str_replace()
. Notice the section "UTF-8 Safe Functionality": explode()
and str_replace()
are safe as long as all three arguments to it are valid UTF-8 strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With