This is my problem: My language (Portuguese) uses ISO-8859-1 char encoding! When I want access a character from a string like 'coração' (heart) I use:
mb_internal_encoding('ISO-8859-1');
$str = "coração";
$len = mb_strlen($str,'UTF-8');
for($i=0;$i<$len;++$i)
echo mb_substr($str, $i, 1, 'UTF-8')."<br/>";
This produces:
c o r a ç ã o
This works fine... But my issue is if the use of mb_substr function is not fast as simple string normal access! But I want a simple way to do this.... like in normal string character access: echo $str[$pos].... It is possible?
Mbstring stands for multi-byte string functions. Mbstring is an extension of php used to manage non-ASCII strings. Mbstring is used to convert strings to different encodings. Multibyte character encoding schemes are used to express more than 256 characters in the regular byte wise coding system.
A null-terminated multibyte string (NTMBS), or "multibyte string", is a sequence of nonzero bytes followed by a byte with value zero (the terminating null character). Each character stored in the string may occupy more than one byte.
mb_substr function is not fast as [...] like in normal string character access: echo $str[$pos].... It is possible?
No.
The multibyte functions have to check every character to determine how many bytes (1 to 4 in UTF-8) it occupies. There you immediately have the reason why character indexing ($a[n]
) won't work: you don't know what byte(s) you need to get the n th character before you've read all characters before that one.
To speed things up a bit, you can look at the answers here: How to iterate UTF-8 string in PHP?
However, since you use ISO 8859-1 or Latin-1, you don't have to use the mb_
functions at all, since in that encoding all characters are encoded in one byte.
Try:
preg_match_all( "/./u", $str, $ar_chars );
print_r( $ar_chars );
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With