I have read that mysql >= 5.5.3 fully supports every possible character if you USE the encoding utf8mb4 for a certain table/column http://mathiasbynens.be/notes/mysql-utf8mb4
looks nice. Only I noticed that the mb_functions in php does not! I cannot find it anywhere in the list: http://php.net/manual/en/mbstring.supported-encodings.php
Not only have I read things but I also made a test.
I have added data to a mysql utf8mb4 table using a php script where the internal encoding was set to UTF-8: mb_internal_encoding("UTF-8");
and, as expected, the characters looks messy once in the db.
Any idea how I can make php and mysql talk the same encoding (possibly a 4 bytes one) and still have FULL support to any world language?
Also why is utf8mb4 different from utf32?
MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character.
After 3 versions are supported (View version: Select version ();). I think that in order to get better compatibility, you should always use UTF8MB4 instead of UTF8. For char type data, UTF8MB4 consumes more space and, according to Mysql's official recommendation, uses VARCHAR instead of char.
MySQL's utf8
encoding is not actual UTF-8. It's an encoding that is kinda like UTF-8, but only supports a subset of what UTF-8 supports. utf8mb4
is actual UTF-8. This difference is an internal implementation detail of MySQL. Both look like UTF-8 on the PHP side. Whether you use utf8
or utf8mb4
, PHP will get valid UTF-8 in both cases.
What you need to make sure is that the connection encoding between PHP and MySQL is set to utf8mb4
. If it's set to utf8
, MySQL will not support all characters. You set this connection encoding using mysql_set_charset()
, the PDO charset
DSN connection parameter or whatever other method is appropriate for your database API of choice.
mb_internal_encoding
just sets the default value for the $encoding
parameter all mb_*
functions have. It has nothing to do with MySQL.
UTF-8 and UTF-32 differ in how they encode characters. UTF-8 uses a minimum of 1 byte for a character and a maximum of 4. UTF-32 always uses 4 bytes for every character. UTF-16 uses a minimum of 2 bytes and a maximum of 4.
Due to its variable length, UTF-8 has a little bit of overhead. A character which can be encoded in 2 bytes in UTF-16 may take 3 or 4 in UTF-8; on the other hand, UTF-16 never uses less than 2 bytes. If you're storing lots of Asian text, UTF-16 may use less storage. If most of your text is English/ASCII, UTF-8 uses less storage. UTF-32 always uses the most storage.
This is what i used, and worked good for my problem using euro € sign and conversion for json_encode failure.
php configurations script( api etc..)
header('Content-Type: text/html; charset=utf-8'); ini_set("default_charset", "UTF-8"); mb_internal_encoding("UTF-8"); iconv_set_encoding("internal_encoding", "UTF-8"); iconv_set_encoding("output_encoding", "UTF-8");
mysql tables / or specific columns
utf8mb4
mysql PDO connection
$dsn = 'mysql:host=yourip;dbname=XYZ;charset=utf8mb4';
(...your connection ...)
before execute query (might not be required):
$dbh->exec("set names utf8mb4");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With