Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is the correct time to use utf8_encode and utf8_decode?

Tags:

php

mysql

Character encoding has always been a problem for me. I don't really get when the correct time to use it is.

All the databases I use now I set up with utf8_general_ci, as that seems to a good 'general' start. I have since learned in the past five minutes that it is case insensitive. So that's helpful.

But my question is when to use utf8_encode and utf8_decode ? As far as I can see now, If I $_POST a form from a table on my website, I need to utf8_encode() the value before I insert it into the database.

Then when I pull it out, I need to utf8_decode it. Is that the case? Or am I missing something?

like image 416
Chud37 Avatar asked Sep 17 '15 13:09

Chud37


People also ask

What UTF 8 means?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.

What string encoding does PHP use?

The default source encoding used by PHP is ISO-8859-1 . Target encoding is done when PHP passes data to XML handler functions.


1 Answers

utf8_encode and _decode are pretty bad misnomers. The only thing these functions do is convert between UTF-8 and ISO-8859-1 encodings. They do exactly the same thing as iconv('ISO-8859-1', 'UTF-8', $str) and iconv('UTF-8', 'ISO-8859-1', $str) respectively. There's no other magic going on which would necessitate their use.

If you receive a UTF-8 encoded string from the browser and you want to insert it as UTF-8 into the database using a database connection with the utf8 charset set, there is absolutely no use for either function anywhere in this chain. You are not interested in converting encodings at all here, and that should be the goal.

The only time you could use either function is if you need to convert from UTF-8 to ISO-8859-1 or vice versa at any point, because external data is encoded in this encoding or an external system expects data in this encoding. But even then, I'd prefer the explicit use of iconv or mb_convert_encoding, since it makes it more obvious and explicit what is going on. And in this day and age, UTF-8 should be the default go-to encoding you use throughout, so there should be very little need for such conversion.

See:

  • What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text
  • Handling Unicode Front To Back In A Web App
  • UTF-8 all the way through
like image 167
deceze Avatar answered Sep 24 '22 05:09

deceze