I have encountered a problem when using a UTF-8 string. I want to read a single character from the string, for example: <pre class="prettyprint"><code>$string = "üÜöÖäÄ"; echo $string[0]; </code></pre> I am expecting to see <code>ü</code>, but I get � -- why?

Use <code>mb_substr($string, 0, 1, 'utf-8')</code> to get the character instead. What happens in your code is that the expression <code>$string[0]</code> gets the first byte of the UTF-8 encoded representation of your string because PHP strings are effectively arrays of bytes (PHP does not internally recognize encodings). Since the first character in your string is composed in more than one byte (UTF-8 encoding rules), you are effectively only getting part of the character. Furthermore, these rules make the byte you are retrieving invalid to stand as a character on its own, which is why you see the question mark. <code>mb_substr</code> knows the encoding rules, so it will not naively give you back just one byte; it will get as many as needed to encode the first character. You can see that <code>$string[0]</code> gives you back just one byte with: <pre class="prettyprint"><code>$string = "üÜöÖäÄ"; echo strlen($string[0]); </code></pre> While <code>mb_substr</code> gives you back two bytes: <pre class="prettyprint"><code>$string = "üÜöÖäÄ"; echo strlen(mb_substr($string, 0, 1, 'utf-8')); </code></pre> And these two bytes are in fact just one character (you need to use <code>mb_strlen</code> for this): <pre class="prettyprint"><code>$string = "üÜöÖäÄ"; echo mb_strlen(mb_substr($string, 0, 1, 'utf-8'), 'utf-8'); </code></pre> Finally, as Marwelln points out below, the situation becomes more tolerable if you use <code>mb_internal_encoding</code> to get rid of the <code>'utf-8'</code> redundancy: <pre class="prettyprint"><code>$string = "üÜöÖäÄ"; mb_internal_encoding('utf-8'); echo mb_strlen(mb_substr($string, 0, 1)); </code></pre> You can see most of the above in action.

Wrong output when using array indexing on UTF-8 string

Tags:

arrays

string

php

char

utf-8

I have encountered a problem when using a UTF-8 string. I want to read a single character from the string, for example:

$string = "üÜöÖäÄ";
echo $string[0];

I am expecting to see ü, but I get � -- why?

356

asked Jun 11 '11 11:06

bozd

1 Answers

Use mb_substr($string, 0, 1, 'utf-8') to get the character instead.

What happens in your code is that the expression $string[0] gets the first byte of the UTF-8 encoded representation of your string because PHP strings are effectively arrays of bytes (PHP does not internally recognize encodings).

Since the first character in your string is composed in more than one byte (UTF-8 encoding rules), you are effectively only getting part of the character. Furthermore, these rules make the byte you are retrieving invalid to stand as a character on its own, which is why you see the question mark.

mb_substr knows the encoding rules, so it will not naively give you back just one byte; it will get as many as needed to encode the first character.

You can see that $string[0] gives you back just one byte with:

$string = "üÜöÖäÄ";
echo strlen($string[0]);

While mb_substr gives you back two bytes:

$string = "üÜöÖäÄ";
echo strlen(mb_substr($string, 0, 1, 'utf-8'));

And these two bytes are in fact just one character (you need to use mb_strlen for this):

$string = "üÜöÖäÄ";
echo mb_strlen(mb_substr($string, 0, 1, 'utf-8'), 'utf-8');

Finally, as Marwelln points out below, the situation becomes more tolerable if you use mb_internal_encoding to get rid of the 'utf-8' redundancy:

$string = "üÜöÖäÄ";
mb_internal_encoding('utf-8');
echo mb_strlen(mb_substr($string, 0, 1));

You can see most of the above in action.

116

answered Sep 18 '22 17:09

Jon

Related questions
                            
                                OpenCart load Model outside Controller
                            
                                PHP SHA3 functionality
                            
                                Explain file_get_contents('php://input')
                            
                                Regex for only allowing letters, numbers, space, commas, periods?
                            
                                Google API PHP offline access, "invalid_grant: Code was already redeemed"
                            
                                How can I learn more about why my Laravel Queued Job failed?
                            
                                How to get attribute name instead of slug in variation?
                            
                                In laravel how to pass extra data to mutators and accessors
                            
                                MySQL 5.7+, JSON_SET value in nested path
                            
                                Twig 2.0 error message "Accessing Twig_Template attributes is forbidden"
                            
                                How can I access request headers that don't appear in $_SERVER?
                            
                                line break problem with MultiCell in FPDF
                            
                                Find oldest file in a folder using PHP
                            
                                Pull first X words (not just characters) from mySQL
                            
                                Cache layer for MVC - Model or controller?
                            
                                pdo - Call to a member function prepare() on a non-object [duplicate]
                            
                                Algorithms for string similarities (better than Levenshtein, and similar_text)? Php, Js
                            
                                How to detect internet speed in PHP?
                            
                                Can the time to live (TTL) for a memcached key be set to infinite?
                            
                                php: get variable type hint using reflection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With