Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to serialize a struct in c?

Tags:

c

networking

I have a struct object that comprises of several primitive data types, pointers and struct pointers. I want to send it over a socket so that it can be used at the other end. As I want to pay the serialization cost upfront, how do I initialize an object of that struct so that it can be sent immediately without marshalling? For example

struct A {
    int i;  
    struct B *p;
};

struct B {
    long l;
    char *s[0];
};

struct A *obj; 

// can do I initialize obj?
int len = sizeof(struct A) + sizeof(struct B) + sizeof(?);
obj = (struct A *) malloc(len);
...

write(socket, obj, len);

// on the receiver end, I want to do this
char buf[len];

read(socket, buf, len);
struct A *obj = (struct A *)buf;
int i = obj->i;
char *s = obj->p->s[0];
int i obj.i=1; obj.p.

Thank you.

like image 339
cody Avatar asked Mar 29 '13 17:03

cody


People also ask

Can you serialize a struct?

If you want to serialize the struct in your code, you could use the following code. The serialize is according to the struct of the class, hence it is impossible to serialize part of the struct.

What is serialization in C?

Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.

What is the use of serialize ()?

Definition and Usage The serialize() function converts a storable representation of a value. To serialize data means to convert a value to a sequence of bits, so that it can be stored in a file, a memory buffer, or transmitted across a network.

What is serialize vs deserialize?

Serialization is a mechanism of converting the state of an object into a byte stream. Deserialization is the reverse process where the byte stream is used to recreate the actual Java object in memory. This mechanism is used to persist the object.


2 Answers

The simplest way to do this may be to allocate a chunk of memory to hold everything. For instance, consider a struct as follows:

typedef struct A {
  int v;
  char* str;
} our_struct_t;

Now, the simplest way to do this is to create a defined format and pack it into an array of bytes. I will try to show an example:

int sLen = 0;
int tLen = 0;
char* serialized = 0;
char* metadata = 0;
char* xval = 0;
char* xstr = 0;
our_struct_t x;
x.v   = 10;
x.str = "Our String";
sLen  = strlen(x.str); // Assuming null-terminated (which ours is)
tLen  = sizeof(int) + sLen; // Our struct has an int and a string - we want the whole string not a mem addr
serialized = malloc(sizeof(char) * (tLen + sizeof(int)); // We have an additional sizeof(int) for metadata - this will hold our string length
metadata = serialized;
xval = serialized + sizeof(int);
xstr = xval + sizeof(int);
*((int*)metadata) = sLen; // Pack our metadata
*((int*)xval) = x.v; // Our "v" value (1 int)
strncpy(xstr, x.str, sLen); // A full copy of our string

So this example copies the data into an array of size 2 * sizeof(int) + sLen which allows us a single integer of metadata (i.e. string length) and the extracted values from the struct. To deserialize, you could imagine something as follows:

char* serialized = // Assume we have this
char* metadata = serialized;
char* yval = metadata + sizeof(int);
char* ystr = yval + sizeof(int);
our_struct_t y;
int sLen = *((int*)metadata);
y.v = *((int*)yval);
y.str = malloc((sLen + 1) * sizeof(char)); // +1 to null-terminate
strncpy(y.str, ystr, sLen);
y.str[sLen] = '\0';

As you can see, our array of bytes is well-defined. Below I have detailed the structure:

  • Bytes 0-3 : Meta-data (string length)
  • Bytes 4-7 : X.v (value)
  • Bytes 8 - sLen : X.str (value)

This kind of well-defined structure allows you to recreate the struct on any environment if you follow the defined convention. To send this structure over the socket, now, depends on how you develop your protocol. You can first send an integer packet containing the total length of the packet which you just constructed, or you can expect that the metadata is sent first/separately (logically separately, this technically can still all be sent at the same time) and then you know how much data to receive on the client-side. For instance, if I receive metadata value of 10 then I can expect sizeof(int) + 10 bytes to follow to complete the struct. In general, this is probably 14 bytes.

EDIT

I will list some clarifications as requested in the comments.

I do a full copy of the string so it is in (logically) contiguous memory. That is, all the data in my serialized packet is actually full data - there are no pointers. This way, we can send a single buffer (we call is serialized) over the socket. If simply send the pointer, the user receiving the pointer would expect that pointer to be a valid memory address. However, it is unlikely that your memory addresses will be exactly the same. Even if they are, however, he will not have the same data at that address as you do (except in very limited and specialized circumstances).

Hopefully this point is made more clear by looking at the deserialization process (this is on the receiver's side). Notice how I allocate a struct to hold the information sent by the sender. If the sender did not send me the full string but instead only the memory address, I could not actually reconstruct the data which was sent (even on the same machine we have two distinct virtual memory spaces which are not the same). So in essence, a pointer is only a good mapping for the originator.

Finally, as far as "structs within structs" go, you will need to have several functions for each struct. That said, it is possible that you can reuse the functions. For instance, if I have two structs A and B where A contains B, I can have two serialize methods:

char* serializeB()
{
  // ... Do serialization
}

char* serializeA()
{
  char* B = serializeB();
  // ... Either add on to serialized version of B or do some other modifications to combine the structures
}

So you should be able to get away with a single serialization method for each struct.

like image 91
RageD Avatar answered Sep 18 '22 08:09

RageD


This answer is besides the problems with your malloc.

Unfortunately, you cannot find a nice trick that would still be compatible with the standard. The only way of properly serializing a structure is to separately dissect each element into bytes, write them to an unsigned char array, send them over the network and put the pieces back together on the other end. In short, you would need a lot of shifting and bitwise operations.

In certain cases you would need to define a kind of protocol. In your case for example, you need to be sure you always put the object p is pointing to right after struct A, so once recovered, you can set the pointer properly. Did everyone say enough already that you can't send pointers through network?

Another protocolish thing you may want to do is to write the size allocated for the flexible array member s in struct B. Whatever layout for your serialized data you choose, obviously both sides should respect.

It is important to note that you cannot rely on anything machine specific such as order of bytes, structure paddings or size of basic types. This means that you should serialize each field of the element separately and assign them fixed number of bytes.

like image 34
Shahbaz Avatar answered Sep 19 '22 08:09

Shahbaz