Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standard C++ code for serialization/deserialization purposes

I've been working with hardware APIs for a long time and almost all of the APIs that I've been working had a C interface. So, in many times I was working with naked news, unsecure buffering and many C functions wrapped with C++ code. In the end, the frontier between C pure code and C++ pure code was messed up in my mind (and I don't know if clarify this frontier is useful at all).

Now, due a some new coding style requirements, I need to refactor all the code suspected to be insecure into more secure one written in C++ (assuming that the C++ code would be more secure) the final goal is to increase the code security using the tools that C++ brings on.

So, in order to get rid of all my confusion, I'm asking for help about a couple of topics of C/C++.

memcpy vs std::copy

AFAIK memcpy is a function that lies on the C libraries, so it isn't C++ish; in the other hand std::copy is a function into the STL so it's pure C++.

  • But, this is true? after all, the std::copy will call std::memcpy (into the cstring header) if the data is trivially copiable.
  • Refactoring all the memcpy calls into std::copy calls would make the code more "pure C++"?.

To deal with the new code style requirements I've decided to go on with the memcpy refactor after all, there's some questions about the memcpy and std::copy:

memcpy is type insecure, because it works with raw void pointers that can manage any kind of pointer regardless of it's type but at the same time is very flexible, the std::copy lacks of this flexibility assuring the type safety. At the first sight, memcpy is the best choice to work with serialization and deserialization routines (that's my real case of use indeed), for example, to send some values through a custom serial port library:

void send(const std::string &value)
{
    const std::string::size_type Size(value.size());
    const std::string::size_type TotalSize(sizeof(Size) + value.size());
    unsigned char *Buffer = new unsigned char[TotalSize];
    unsigned char *Current = Buffer;

    memcpy(Current, &Size, sizeof(Size));
    Current += sizeof(Size);

    memcpy(Current, value.c_str(), Size);

    sendBuffer(Buffer, TotalSize);

    delete []Buffer;
}

The code above works fine, but looks horrible; we're getting rid of the std::string encapsulation accesing it's internal memory through the std::string::c_str() method, we need to take care of allocations and deallocations of dynamic memory, play with pointers and treat all values as unsigned chars (see the next part), the question is: there's a better way to do this?

My first attempts at trying to solve the above problems using std::copy doesn't satisfy me altogether:

void send(const std::string &value)
{
    const std::string::size_type Size(value.size());
    const std::string::size_type TotalSize(sizeof(Size) + value.size());

    std::vector<unsigned char> Buffer(TotalSize, 0);

    std::copy(&Size, &Size + 1, Buffer.begin());
    std::copy(value.begin(), value.end(), Buffer.begin() + sizeof(Size));

    sendBuffer(Buffer.data(), TotalSize);
}

With the above approach, the memory management isn't a problem anymore, the std::vector takes the responsability of allocating, store and finally deallocate the data at the end of the scope, but the calls mixing std::copy with pointer arithmetics and iterators arithmetics is pretty annoying and in the end, I'm ignoring the std::vector encapsulation in the sendBuffer call after all.

After the previous tries, I've coded something with std::stringstreams but the results were even worse and now, I'm wondering if:

  • There's a way to serialize objects and values in a safe way, without breaking encapsulations, without excesive or confusing pointer/iterator arithmetics and without dynamic memory management or it's just an impossible goal? (yes, I've heard about boost::serialization, but for now I'm not allowed to integrate it).

And:

  • What's the best use of std::copy for serialization/deserialization purposes? (if any).
  • The std::copy rationale is limited for copying containers or arrays and using it for raw memory is a bad choice?

alloc/free vs new/delete vs std::allocator

The other big topic is the allocation of memory. AFAIK the malloc/free functions aren't forbidden into the C++ scope although they're from C. And the new/delete operators are from the C++ scope and they aren't ANSI C.

  • I'm right?
  • new/delete can be used in ANSI C?

Assuming that I need to refactor all C-flavoured code into C++ code, I'm getting rid of all the alloc/free spreaded arround some legacy code and I've found that reserving dynamic memory is quite confusing, the void type doesn't carry any information about size, because of that it's impossible to reserve a data buffer using void as type:

void *Buffer = new void[100]; // <-- How many bytes is each 'void'?

Due the lack of pure-raw-binary-data-pointers, is a common practice to create pointers to unsigned char. The char in order to equal the elements count and size. And the unsigned in order to avoid unexpected signed-unsigned conversions during the data copy. Maybe it's a common practice, but it's a mess... unsigned char isn't int nor float nor my_awesome_serialization_struct if I'm forced to choose some kind of dummy pointer to binary data I will prefer void * instead of unsigned char *.

So when I need a dynamic buffer for serialization/deserialization purposes there's no way I can avoid the unsigned char * stuff in order to refactor into a type secure buffer management; but when I was rage-refactoring all the alloc/free pairs into new/delete pairs I read about the std::allocator.

The std::allocator allows to reserve memory chunks in a type-safe way, at the first sight I bet that it will be useful, but there's no great differences between allocating with std::allocator<int>::allocate or new int or so I thought, same was for std::allocator<int>::deallocate and delete int.

And now, I've lost the north about the dynamic memory management, that's why I'm asking:

  • There's a good C++ practice involving the dynamic memory management for serialization/deserialization purposes that grants type-safe management?
  • Is possible to avoid the use of const char * for serialization/deserialization memory Buffers?
  • What's the rationale of std::allocator and what's its's use on serialization/deserialization scope? (if any).

Thanks for your attention!

like image 920
PaperBirdMaster Avatar asked Oct 15 '12 13:10

PaperBirdMaster


People also ask

What is serialize and deserialize in C?

Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.

Why do we use serialization and deserialization?

Serialization and deserialization work together to transform/recreate data objects to/from a portable format. Serialization enables us to save the state of an object and recreate the object in a new location. Serialization encompasses both the storage of the object and exchange of data.

What is serialization and deserialization in C++?

Serialize and Deserialize Binary Tree in C++ As we know that the serialization is the process of converting a data structure or object into a sequence of bits so we can store them in a file or memory buffer, and that can be reconstructed later in the same or another computer environment.

How does deserialization work in C#?

December 17, 2021. Serialization in C# is the process of bringing an object into a structure that is composed in memory. Deserialization is the opposite of serialization. It involves retrieving the serialized object so that it can be stored in memory.


1 Answers

My experience is, that type safety in C++ means not only that the compiler complains on type mismatches. It rather means you should in general not have to take care about the memory layout of your data. In fact, the C++ standard has only very few requirements on the memory layout of certain data types.

Your serialization is based on direct memory access, so, I'm afraid there won't be a simple "pure" C++ solution and particularly no general compiler/platform independent solution.

like image 101
bjhend Avatar answered Oct 19 '22 23:10

bjhend