Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB PHP UTF-8 problems

Tags:

php

mongodb

utf-8

Assume that I need to insert the following document:

{
    title: 'Péter'
}

(note the é)

It gives me an error when I use the following PHP-code ... :

$db->collection->insert(array("title" => "Péter"));

... because it needs to be utf-8.

So I should use this line of code:

$db->collection->insert(array("title" => utf8_encode("Péter")));

Now, when I request the document, I still have to decode it ... :

$document = $db->collection->findOne(array("_id" => new MongoId("__someID__")));
$title = utf8_decode($document['title']);

Is there some way to automate this process? Can I change the character-encoding of MongoDB (I'm migrating a MySQL-database that's using cp1252 West Europe (latin1)?

I already considered changing the Content-Type-header, problem is that all static strings (hardcoded) aren't utf8...

Thanks in advance! Tim

like image 595
cutsoy Avatar asked May 07 '11 11:05

cutsoy


People also ask

Does MongoDB support UTF-8?

MongoDB uses UTF-8 character encoding which is part of the Unicode Standard.

Does PHP support UTF-8?

PHP does not natively support UTF-8. This is fairly important to keep in mind when dealing with UTF-8 encoded data in PHP.

Can PHP work with MongoDB?

You can add the driver to your application to work with MongoDB in PHP. The MongoDB PHP Driver consists of the two following components: The extension , which provides a low-level API and mainly serves to integrate libmongoc and libbson with PHP.


1 Answers

JSON and BSON can only encode / decode valid UTF-8 strings, if your data (included input) is not UTF-8 you need to convert it before passing it to any JSON dependent system, like this:

$string = iconv('UTF-8', 'UTF-8//IGNORE', $string); // or
$string = iconv('UTF-8', 'UTF-8//TRANSLIT', $string); // or even
$string = iconv('UTF-8', 'UTF-8//TRANSLIT//IGNORE', $string); // not sure how this behaves

Personally I prefer the first option, see the iconv() manual page. Other alternatives include:

  • mb_convert_encoding()
  • utf8_encode(utf8_decode($string))

You should always make sure your strings are UTF-8 encoded, even the user-submitted ones, however since you mentioned that you're migrating from MySQL to MongoDB, have you tried exporting your current database to CSV and using the import scripts that come with Mongo? They should handle this...


EDIT: I mentioned that BSON can only handle UTF-8, but I'm not sure if this is exactly true, I have a vague idea that BSON uses UTF-16 or UTF-32 to encode / decode data, but I can't check now.

like image 122
Alix Axel Avatar answered Sep 24 '22 07:09

Alix Axel