Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP UTF-8 questions - If I create a string in PHP... is it in UTF-8?

Tags:

php

unicode

utf-8

In PHP, if I create a string like this:

$str = "bla bla here is my string";

Will I then be able to use the mbstring functions to operate on that string as UTF8?

// Will this work?
$str = mb_strlen($str); 

Further, if I then have another string that I know is UTF-8 (say it was a POSTed form value, or a UTF-8 string from a database), can I then concatenate these two and not have any problems?

// What about this, will this work? 
$str = $str . $utf8_string_from_database;
like image 610
Keith Palmer Jr. Avatar asked Dec 07 '22 08:12

Keith Palmer Jr.


1 Answers

First question: it depends on what exactly goes in the string.

In PHP (up to PHP5, anyway), strings are just sequences of bytes. There is no implied or explicit character set associated with them; that's something the programmer must keep track of. So, if you only put valid UTF-8 bytes between the quotes (fairly easy if the file itself is encoded as UTF-8), then the string will be UTF-8, and you can safely use mb_strlen() on it.

Also, if you're using mbstring functions, you need to explicitly tell it what character set your string is, either with mbstring.internal_encoding or as the last argument to any mbstring function.

Second question: yes, with caveats.

Two strings that are both independently valid UTF-8 can be safely byte-wise concatenated (like with PHP's . operator) and still be valid UTF-8. However, you can never be sure, without doing some work yourself, that a POSTed string is valid UTF-8. Database strings are a little easier, if you carefully set the connection character set, because most DBMSs will do any conversion for you.

like image 130
chazomaticus Avatar answered Apr 28 '23 23:04

chazomaticus