Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is PHP serialize function compatible UTF-8?

I have a site I want to migrate from ISO to UTF-8.

I have a record in database indexed by the following primary key :

s:22:"Informations générales";

The problem is, now (with UTF-8), when I serialize the string, I get :

s:24:"Informations générales";

(notice the size of the string is now the number of bytes, not string length)

So this is not compatible with non-utf8 previous records !

Did I do something wrong ? How could I fix this ?

Thanks

like image 758
Matthieu Napoli Avatar asked Mar 30 '10 07:03

Matthieu Napoli


People also ask

What is serialize function in PHP?

The serialize() function converts a storable representation of a value. To serialize data means to convert a value to a sequence of bits, so that it can be stored in a file, a memory buffer, or transmitted across a network.

How can I serialize data in PHP?

To get the POST values from serializeArray in PHP, use the serializeArray() method. The serializeArray( ) method serializes all forms and form elements like the . serialize() method but returns a JSON data structure for you to work with.

What is serialization in PHP explain with example?

Serializing an object means converting it to a bytestream representation that can be stored in a file. This is useful for persistent data; for example, PHP sessions automatically save and restore objects.

How check data is serialized or not in PHP?

is_serialized( string $data, bool $strict = true ): bool Checks value to find if it was serialized.


2 Answers

The behaviour is completely correct. Two strings with different encodings will generate different byte streams, thus different serialization strings.

like image 121
soulmerge Avatar answered Sep 19 '22 19:09

soulmerge


Dump the database in latin1.

In the command line:

sed  -e 's/latin1/utf8/g' -i ./DBNAME.sql

Import the file converted to a new database in UTF-8.

Use a php script to update each field. Make a query, loop through each field and update the serialized string using this:

$str = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $str);

After that, I was able to use unserialize() and everything working with UTF-8.

like image 38
Rulo Avatar answered Sep 18 '22 19:09

Rulo