Assume that I need to insert the following document:
{
title: 'Péter'
}
(note the é)
It gives me an error when I use the following PHP-code ... :
$db->collection->insert(array("title" => "Péter"));
... because it needs to be utf-8.
So I should use this line of code:
$db->collection->insert(array("title" => utf8_encode("Péter")));
Now, when I request the document, I still have to decode it ... :
$document = $db->collection->findOne(array("_id" => new MongoId("__someID__")));
$title = utf8_decode($document['title']);
Is there some way to automate this process? Can I change the character-encoding of MongoDB (I'm migrating a MySQL-database that's using cp1252 West Europe (latin1)?
I already considered changing the Content-Type-header, problem is that all static strings (hardcoded) aren't utf8...
Thanks in advance! Tim
MongoDB uses UTF-8 character encoding which is part of the Unicode Standard.
PHP does not natively support UTF-8. This is fairly important to keep in mind when dealing with UTF-8 encoded data in PHP.
You can add the driver to your application to work with MongoDB in PHP. The MongoDB PHP Driver consists of the two following components: The extension , which provides a low-level API and mainly serves to integrate libmongoc and libbson with PHP.
JSON and BSON can only encode / decode valid UTF-8 strings, if your data (included input) is not UTF-8 you need to convert it before passing it to any JSON dependent system, like this:
$string = iconv('UTF-8', 'UTF-8//IGNORE', $string); // or
$string = iconv('UTF-8', 'UTF-8//TRANSLIT', $string); // or even
$string = iconv('UTF-8', 'UTF-8//TRANSLIT//IGNORE', $string); // not sure how this behaves
Personally I prefer the first option, see the iconv()
manual page. Other alternatives include:
mb_convert_encoding()
utf8_encode(utf8_decode($string))
You should always make sure your strings are UTF-8 encoded, even the user-submitted ones, however since you mentioned that you're migrating from MySQL to MongoDB, have you tried exporting your current database to CSV and using the import scripts that come with Mongo? They should handle this...
EDIT: I mentioned that BSON can only handle UTF-8, but I'm not sure if this is exactly true, I have a vague idea that BSON uses UTF-16 or UTF-32 to encode / decode data, but I can't check now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With