Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialize Data Structures in C

I'd like a C library that can serialize my data structures to disk, and then load them again later. It should accept arbitrarily nested structures, possibly with circular references.

I presume that this tool would need a configuration file describing my data structures. The library is allowed to use code generation, although I'm fairly sure it's possible to do this without it.

Note I'm not interested in data portability. I'd like to use it as a cache, so I can rely on the environment not changing.

Thanks.


Results

Someone suggested Tpl which is an awesome library, but I believe that it does not do arbitrary object graphs, such as a tree of Nodes that each contain two other Nodes.

Another candidate is Eet, which is a project of the Enlightenment window manager. Looks interesting but, again, seems not to have the ability to serialize nested structures.

like image 689
Daniel Lucraft Avatar asked Dec 16 '08 14:12

Daniel Lucraft


People also ask

What is serialized in data structure?

In computing, serialization (US and Oxford spelling) or serialisation (UK spelling) is the process of translating a data structure or object state into a format that can be stored (for example, in a file or memory data buffer) or transmitted (for example, over a computer network) and reconstructed later (possibly in a ...

What is serialization in C?

Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.

What is the use of serialize ()?

The serialize() function converts a storable representation of a value. To serialize data means to convert a value to a sequence of bits, so that it can be stored in a file, a memory buffer, or transmitted across a network.

What is serialization with example?

Serialization is a mechanism of converting the state of an object into a byte stream. Deserialization is the reverse process where the byte stream is used to recreate the actual Java object in memory. This mechanism is used to persist the object.


2 Answers

Check out tpl. From the overview:

Tpl is a library for serializing C data. The data is stored in its natural binary form. The API is small and tries to stay "out of the way". Compared to using XML, tpl is faster and easier to use in C programs. Tpl can serialize many C data types, including structures.

like image 170
Robert Gamble Avatar answered Sep 22 '22 07:09

Robert Gamble


I know you're asking for a library. If you can't find one (::boggle::, you'd think this was a solved problem!), here is an outline for a solution:

You should be able to write a code generator[1] to serialize trees/graphs without (run-time) pre-processing fairly simply.

You'll need to parse the node structure (typedef handling?), and write the included data values in a straight ahead fashion, but treat the pointers with some care.

  • For pointer to other objects (i.e. char *name;) which you know are singly referenced, you can serialize the target data directly.

  • For objects that might be multiply refernced and for other nodes of your tree you'll have to represent the pointer structure. Each object gets assigned a serialization number, which is what is written out in-place of the pointer. Maintain a translation structure between current memory position and serialization number. On encountering a pointer, see if it is already assigned a number, if not, give it one and queue that object up for serialization.

Reading back also requires a node-#/memory-location translation step, and might be easier to do in two passes: regenerate the nodes with the node numbers in the pointer slots (bad pointer, be warned) to find out where each node gets put, then walk the structure again fixing the pointers.

I don't know anything about tpl, but you might be able to piggy-back on it.


The on-disk/network format should probably be framed with some type information. You'll need a name-mangling scheme.


[1] ROOT uses this mechanism to provide very flexible serialization support in C++.


Late addition: It occurs to me that this is not always as easy as I implied above. Consider the following (contrived and badly designed) declaration:

enum {    mask_none = 0x00,    mask_something = 0x01,    mask_another = 0x02,    /* ... */    mask_all = 0xff }; typedef struct mask_map {    int mask_val;    char *mask_name; } mask_map_t; mask_map_t mask_list[] = {    {mask_something, "mask_something"},    {mask_another, "mask_another"},    /* ... */ }; struct saved_setup {    char* name;    /* various configuration data */    char* mask_name;    /* ... */ }; 

and assume that we initalize out struct saved_setup items so that mask_name points at mask_list[foo].mask_name.

When we go to serialize the data, what do we do with struct saved_setup.mask_name?

You will need to take care in designing your data structures and/or bring some case-specific intelligence to the serialization process.

like image 37
dmckee --- ex-moderator kitten Avatar answered Sep 25 '22 07:09

dmckee --- ex-moderator kitten