In what circumstances should you serialize data? When should you not?

Question

I'm aware that serializing is used to convert data types into a storable format, for purposes such as caching.

What I'm more specifically asking is, what are the circumstances in which you should actually decide to store data ( using serialize() in PHP, pickle module in Python, et cetera )?

Let's say we had a high traffic website, and in our /blog page we are using static content xml files, a gettext mo file, and dynamically generated content from a database.

Example #1:

The file we rely on for static content is en/blog.xml:

'<content><![CDATA[
<h1>Welcome to my blog!</h1>
<p>Lorem ipsum dolor sit amet..</p>

]]></content>'

Would we want to serialize this xml file itself and store it in cache?

Example #2:

We also have a dynamically generated form, normally I would assume I would not serialize anything because it's server-side generated and dynamic, but our form field labels are internationalized and the user requested this page in spanish, therefore we are using a translation class which grabs form field labels stored in mo/csv/xml format.

Contents of contact-us.php:

<label for="first_name"><?php echo $L->_("First Name");?></label>
<input id="first_name" name="first_name" type="text">

The "First Name" message id translation is pulled from the application-level translation file, which we parse and store in an array which resides in our translation class. So it would be ideal for our code to not parse the mo file on every page request, and instead serialize the whole array after parsing the mo, and then rely on the serialized dump of that?

Example #3:

Let's say on our blog page we're pulling in the 5 most recent blog posts.

$posts = BlogClass->sql('SELECT blog_message, blog_author FROM blog_posts LIMIT 5 ORDER BY blog_date DESC');

Would we want to rely on something like memcache and just set a key to the result of the sql statement, would it serialize the results of the query, or?

Bonus:

If anyone could actually provide specific examples of efficient/practical uses/mis-uses of serialization, that'd be great - something like a multi-page, huge huge form that pulls in database information and stores stuff in sessions, or any examples where you had to rely on serialize..

hobodave · Accepted Answer

Example 1

Profile.

Is it prohibitively costly to generate your content pages?
Is it significantly less costly to deserialize your generated content?

If both answers are yes, consider it.

Example 2

Profile.

Is it prohibitively costly to generate your content pages?
Is it significantly less costly to deserialize your generated content?

If both answers are yes, consider it.

Example 3

Profile.

Is that query prohibitively expensive?
Is it significantly faster to grab the data from memcached?

If both answers are yes, consider it.

Bonus

I never serialize my data just because I can. I need to have a reason to do so, otherwise it's just premature optimization. There are several factors that come into deciding whether this should be done.

Performing sorting or other operations on a serialized set of data

This will almost always be a bad idea. e.g. if you serialized a resultset from a database, then needed to reorder this set by some field, you're shooting yourself in the foot.

Messaging

If you need to communicate serialized data to other services/languages then choice of serialization is critical. I avoid serializing using a language specific method if I know or think that other things may need to read it. JSON is often an ideal format for cross language serialization.

Updating serialized data

You have to be willing to regenerate the serialized data for updates to it's source. It will be prohibitively expensive to do any type of complex updates to the serialized data.

Human readability

If you need to read it easily, I suggest avoiding language specific formats. I suggest JSON for this.

Edit:

I just looked again at the query in Example 3. That is an extremely simple query, you're only selecting 2 fields, and ordering by a date field. With a properly indexed table this query should be trivial, and I would not suggest caching something like this into memcached.

In what circumstances should you serialize data? When should you not?

Tags:

python

php

serialization

meder omuraliev

1 Answers

Example 1

Example 2

Example 3

Bonus

Performing sorting or other operations on a serialized set of data

Messaging

Updating serialized data

Human readability

Edit:

hobodave

Recent Activity

Donate For Us

In what circumstances should you serialize data? When should you not?

Tags:

python

php

serialization

meder omuraliev

1 Answers

Example 1

Example 2

Example 3

Bonus

Performing sorting or other operations on a serialized set of data

Messaging

Updating serialized data

Human readability

Edit:

hobodave

Related questions

Recent Activity

Donate For Us