Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary serialization in pure C/C++

I'd like to implement the binary serialization on my own, without using Boost or any other third-party library.

In C++ the simpliest way to achieve it is to use ofstream and then send a binary file over network. But is there any other stream class which I can use as a temporary buffer to avoid writing file to disk?

Also, how can I achieve that in pure C?

like image 225
Secret Avatar asked Jun 12 '12 19:06

Secret


2 Answers

Persistence is hard issue. It is not trivial to even serialize an object to disk. Say that, for example, you have a structure like this one in C:

struct Person {
    char name[100];
    int year;
};

This is a sef-contained structure, probably the simplest way in which serialization can really be applied. However, you'll have to face the following problems:

  1. The compiler's padding system. The way to complete a structure in memory so it occupies a whole number of words in memory is not standard.

  2. The way the operating system and the machine itself represents data in binary form. Obviously, this representation changes from one machine to another one.

The conclusion is that a file created even by the same program in the same operating system may not be compatible with the same program in the same operating system, because maybe both programs were compiled with different C compilers.

Now let's see an object in C++:

class Person {
public:
    // more things...

private:
    string name;
    Date * birth;
    Firm * firm;
};

Now the very same thing has become really complex. The object is no more self-contained, you should follow the pointers in order to decide how to deal with each object (this is called 3. pointer swizzling and transitive persistence). And you still have 1) and 2) problems.

So let's say that you focus on self-contained objects, and still need a solution for points 1 & 2. The only way to go is to decide a representation in either a) text format or b) bytecode format. Bytecode format can be understood by any program in any operating system, compiled with any C compiler, because the information is read and written byte by byte. This is the way that Java or C# serialize their objects. Text format as a representation is as valid as bytecode, though slower. Its main advantage is that it can be understood by a human being as well as the computer (a structured text format could be XML).

So, in order to serialize your self-contained objects, however the output format chosen, you need to have basic functions (or classes in C++) that are able to read ints, chars, strings, and so on. When you have the write/read pairs for each one, you'll have to provide the programmer with the possibility to create her own write/read pairs for her objects, using you read/write pairs for elemental data.

We are talking here about a complete framework, something like what Python offers with its pickle module.

Finally, the fact of being able to cache your serialization instead of saving it to disk, is the least of your problems. You could use the ostringstream class if you are using a text-based format, or a memory block if you are using bytecode.

As you can see, it is not a simple job. Hope this helps.

like image 175
Baltasarq Avatar answered Sep 28 '22 05:09

Baltasarq


I have been using JSON for serializing data. It is simple, which is a very good thing. It is easy to get JSON right, and easy to tell if anything goes wrong with it.

It is not as space-efficient as other formats, but for many purposes it is good enough. And there is free library code you can get from the JSON web site.

http://json.org/

like image 32
steveha Avatar answered Sep 28 '22 04:09

steveha