Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do serialization of Class having members of custom data types in C++? [closed]

I want to serialize and deserialize a class Mango . So I have created a function serialize and deserialize respectively.

? serialize(Mango &Man) /// What should be return ?
{
}

    
Mango deserialize(  ?   ) /// What should be function parameter ?
{
}

I don't know how to implement it very efficiently in terms of speed, portability , memory because it contains 10 members of custom data types ( I just mention one but they are all similar) which again are very complex.

I want suggestions for implementation for eg : what should be the return type of serialize function ? vector of bytes ie std::vector<uint8_t> serialize(Mango &Man) ? or should it be nothing like just serializing it into bytes and storing it in memory? or any other way?

Mango Class

class Mango
{
public:
    const MangoType &getMangoType() const { return typeMan; }
    MangoType &getMangoType() { return typeMan; }

private:
    // There are many members of different types : I just mention one.
    MangoType typeMan;
};

Data type classes

//MangoType Class
class MangoType
{
    /// It only has one member ie content
public:
    /// Getter of content vector.

    std::vector<FuntionMango> &getContent() noexcept { return Content; }

private:
    /// \name Data of MangoType.
    
    std::vector<FuntionMango> Content;
    
};


/// FuntionMango class.
class FuntionMango
{
public:
    /// Getter of param types.
    const std::vector<ValType> &getParamTypes() const noexcept
    {
        return ParamTypes;
    }
    std::vector<ValType> &getParamTypes() noexcept { return ParamTypes; }

    /// Getter of return types.
    const std::vector<ValType> &getReturnTypes() const noexcept
    {
        return ReturnTypes;
    }
    std::vector<ValType> &getReturnTypes() noexcept { return ReturnTypes; }

    

private:
    /// \name Data of FuntionMango.
   
    std::vector<ValType> ParamTypes;
    std::vector<ValType> ReturnTypes;

};

//ValType Class
  
enum class ValType : uint8_t
  {
     #define UseValType
     #define Line(NAME, VALUE, STRING) NAME = VALUE
     #undef Line
     #undef UseValType
  };

I want to know the best possible implementation plan in terms of speed and memory for serialize and deserialize functions.

Note : 1) I do not want to transfer it over the network. My usecase is that it is very time consuming to load data everytime in Mango class ( It comes after computation ). So I want to serialize it .. so that next time I want it , I can just deserialize the previous serialized data 2) I do not want to use library which requires linking like boost serialization directly. But is there any way to use it as header only ?

like image 484
James_sheford Avatar asked Sep 20 '25 16:09

James_sheford


2 Answers

I commented:

Perhaps the examples here give you some inspiration. It's possible to write them without any boost, obviously Boost Serialization Binary Archive giving incorrect output

Because I hate when people say "obviously" on a Q&A site, let me show you. I'd suggest the interface to look like this:

std::vector<uint8_t> serialize(Mango const& Man);
Mango                deserialize(std::span<uint8_t const> data);

Alternatively, for file IO you could support e.g.:

void serialize_to_stream(std::ostream& os, Mango const& Man);
void deserialize(std::istream& is, Mango& Man);

Using the approach from the linked example, the suggested implementations would look like:

std::vector<uint8_t> serialize(Mango const& Man) {
    std::vector<uint8_t> bytes;
    do_generate(back_inserter(bytes), Man);
    return bytes;
}

Mango deserialize(std::span<uint8_t const> data) {
    Mango result;
    auto  f = begin(data), l = end(data);
    if (!do_parse(f, l, result))
        throw std::runtime_error("deserialize");
    return result;
}

void serialize_to_stream(std::ostream& os, Mango const& Man)  {
    do_generate(std::ostreambuf_iterator<char>(os), Man);
}

void deserialize(std::istream& is, Mango& Man) {
    Man = {}; // clear it!
    std::istreambuf_iterator<char> f(is), l{};
    if (!do_parse(f, l, Man))
        throw std::runtime_error("deserialize");
}

Of course, that assumes do_generate and do_parse customizations for all the relevant types (ValType, FunctionMango, MangoType, Mango):

Live On Coliru

#include <algorithm>
#include <iomanip> // debug output
#include <iostream>
#include <string>
#include <vector>
#include <span>

namespace MangoLib {
    // your requested signatures:
    class Mango;

    void serialize_to_stream(std::ostream& os, Mango const& Man);
    void deserialize(std::istream& is, Mango& Man);
    std::vector<uint8_t> serialize(Mango const& Man);
    Mango                deserialize(std::span<uint8_t const> data);

    // your specified types (with some demo fill)
    enum class ValType : uint8_t {
#define UseValType
#define Line(NAME, VALUE, STRING) NAME = VALUE
        Line(void_,   0, "void"),
        Line(int_,    1, "int"),
        Line(bool_,   2, "bool"),
        Line(string_, 3, "string"),
#undef Line
#undef UseValType
    };

    using ValTypes = std::vector<ValType>;
    class FuntionMango {
      public:
        const ValTypes& getParamTypes() const noexcept { return ParamTypes; }
        ValTypes& getParamTypes() noexcept { return ParamTypes; }

        const ValTypes& getReturnTypes() const noexcept { return ReturnTypes; }
        ValTypes& getReturnTypes() noexcept { return ReturnTypes; }

      private:
        ValTypes ParamTypes, ReturnTypes;
    };

    using FuntionMangos = std::vector<FuntionMango>;

    class MangoType {
      public:
        FuntionMangos&       getContent() noexcept { return Content; }
        const FuntionMangos& getContent() const noexcept { return Content; }

      private:
        FuntionMangos Content;
    };

    class Mango {
      public:
        const MangoType& getMangoType() const { return typeMan; }
        MangoType&       getMangoType() { return typeMan; }

      private:
        MangoType typeMan;
        // many other members
    };
} // namespace MangoLib

namespace my_serialization_helpers {

    ////////////////////////////////////////////////////////////////////////////
    // This namespace serves as an extension point for your serialization; in
    // particular we choose endianness and representation of strings
    //
    // TODO add overloads as needed (signed integer types, binary floats,
    // containers of... etc)
    ////////////////////////////////////////////////////////////////////////////
    
    // decide on the max supported container capacity:
    using container_size_type = std::uint32_t;
    
    ////////////////////////////////////////////////////////////////////////////
    // generators
    template <typename Out>
    Out do_generate(Out out, std::string const& data) {
        container_size_type len = data.length();
        out = std::copy_n(reinterpret_cast<char const*>(&len), sizeof(len), out);
        return std::copy(data.begin(), data.end(), out);
    }

    template <typename Out, typename T>
    Out do_generate(Out out, std::vector<T> const& data) {
        container_size_type len = data.size();
        out = std::copy_n(reinterpret_cast<char const*>(&len), sizeof(len), out);
        for (auto& el : data)
            out = do_generate(out, el);
        return out;
    }

    template <typename Out> Out do_generate(Out out, uint8_t const& data) {
        return std::copy_n(&data, sizeof(data), out);
    }

    template <typename Out>
    Out do_generate(Out out, uint16_t const& data) {
        return std::copy_n(reinterpret_cast<char const*>(&data), sizeof(data), out);
    }

    template <typename Out>
    Out do_generate(Out out, uint32_t const& data) {
        return std::copy_n(reinterpret_cast<char const*>(&data), sizeof(data), out);
    }

    ////////////////////////////////////////////////////////////////////////////
    // parsers
    template <typename It>
    bool parse_raw(It& in, It last, char* raw_into, size_t n) { // length guarded copy_n
        while (in != last && n) {
            *raw_into++ = *in++;
            --n;
        }
        return n == 0;
    }

    template <typename It, typename T>
    bool parse_raw(It& in, It last, T& into) {
        static_assert(std::is_trivially_copyable_v<T>);
        return parse_raw(in, last, reinterpret_cast<char*>(&into), sizeof(into));
    }

    template <typename It>
    bool do_parse(It& in, It last, std::string& data) {
        container_size_type len;
        if (!parse_raw(in, last, len))
            return false;
        data.resize(len);
        return parse_raw(in, last, data.data(), len);
    }

    template <typename It, typename T>
    bool do_parse(It& in, It last, std::vector<T>& data) {
        container_size_type len;
        if (!parse_raw(in, last, len))
            return false;
        data.clear();
        data.reserve(len);
        while (len--) {
            data.emplace_back();
            if (!do_parse(in, last, data.back()))
                return false;
        };
        return true;
    }

    template <typename It>
    bool do_parse(It& in, It last, uint8_t& data) {
        return parse_raw(in, last, data);
    }

    template <typename It>
    bool do_parse(It& in, It last, uint16_t& data) {
        return parse_raw(in, last, data);
    }

    template <typename It>
    bool do_parse(It& in, It last, uint32_t& data) {
        return parse_raw(in, last, data);
    }
}

namespace MangoLib {

    template <typename Out> Out do_generate(Out out, ValType const& x) {
        using my_serialization_helpers::do_generate;
        return do_generate(out,
                           static_cast<std::underlying_type_t<ValType>>(x));
    }
    template <typename It> bool do_parse(It& in, It last, ValType& x) {
        using my_serialization_helpers::do_parse;
        std::underlying_type_t<ValType> tmp;
        bool ok = do_parse(in, last, tmp);
        if (ok)
            x = static_cast<ValType>(tmp);
        return ok;
    }

    template <typename Out> Out do_generate(Out out, FuntionMango const& x) {
        using my_serialization_helpers::do_generate;
        out = do_generate(out, x.getParamTypes());
        out = do_generate(out, x.getReturnTypes());
        return out;
    }
    template <typename It> bool do_parse(It& in, It last, FuntionMango& x) {
        using my_serialization_helpers::do_parse;
        return do_parse(in, last, x.getParamTypes()) &&
            do_parse(in, last, x.getReturnTypes());
    }

    template <typename Out> Out do_generate(Out out, MangoType const& x) {
        using my_serialization_helpers::do_generate;
        out = do_generate(out, x.getContent());
        return out;
    }
    template <typename It> bool do_parse(It& in, It last, MangoType& x) {
        using my_serialization_helpers::do_parse;
        return do_parse(in, last, x.getContent());
    }

    template <typename Out> Out do_generate(Out out, Mango const& x) {
        out = do_generate(out, x.getMangoType());
        return out;
    }
    template <typename It> bool do_parse(It& in, It last, Mango& x) {
        return do_parse(in, last, x.getMangoType());
    }
}

#include <cassert>

MangoLib::Mango makeMango() {
    MangoLib::Mango mango;

    using MangoLib::ValType;
    MangoLib::FuntionMango f1;
    f1.getParamTypes()  = {ValType::bool_, ValType::string_};
    f1.getReturnTypes() = {ValType::void_};

    MangoLib::FuntionMango f2;
    f2.getParamTypes()  = {ValType::string_};
    f2.getReturnTypes() = {ValType::int_};

    mango.getMangoType().getContent() = {f1, f2};
    return mango;
}

#include <fstream>

int main() {
    auto const mango = makeMango();

    auto const bytes = serialize(mango);
    auto const roundtrip = serialize(MangoLib::deserialize(bytes));
    assert(roundtrip == bytes);

    // alternatively with file IO:
    {
        std::ofstream ofs("output.bin", std::ios::binary);
        serialize_to_stream(ofs, mango);
    }
    // read back:
    {
        std::ifstream ifs("output.bin", std::ios::binary);
        MangoLib::Mango from_file;
        deserialize(ifs, from_file);

        assert(serialize(from_file) == bytes);
    }

    std::cout << "\nDebug dump " << std::dec << bytes.size() << " bytes:\n";
    for (auto ch : bytes)
        std::cout << "0x" << std::hex << std::setw(2) << std::setfill('0')
                  << static_cast<int>((uint8_t)ch) << " " << std::dec;
    std::cout << "\nDone\n";
}

// suggested implementations:
namespace MangoLib {
    std::vector<uint8_t> serialize(Mango const& Man) {
        std::vector<uint8_t> bytes;
        do_generate(back_inserter(bytes), Man);
        return bytes;
    }

    Mango deserialize(std::span<uint8_t const> data) {
        Mango result;
        auto  f = begin(data), l = end(data);
        if (!do_parse(f, l, result))
            throw std::runtime_error("deserialize");
        return result;
    }

    void serialize_to_stream(std::ostream& os, Mango const& Man)  {
        do_generate(std::ostreambuf_iterator<char>(os), Man);
    }

    void deserialize(std::istream& is, Mango& Man) {
        Man = {}; // clear it!
        std::istreambuf_iterator<char> f(is), l{};
        if (!do_parse(f, l, Man))
            throw std::runtime_error("deserialize");
    }
}

Which roundtrips correctly and prints the debug output:

Debug dump 25 bytes:
0x02 0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x02 0x03 0x01 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x01 
Done

Portability

This assumes endianness is not an issue. Of course you might want to normalize endianness. You can do it manually (using ntoh/hton family e.g.), or you could use Boost Endian - which does not require linking to any boost library (Boost Endian is header-only).

E.g.: http://coliru.stacked-crooked.com/a/288829ec964a3ca9

like image 68
sehe Avatar answered Sep 22 '25 06:09

sehe


As @Eljay says in a comment, the exact solution depends on a use case.

For me, if it is a one-off project, the most straight-forward "binary dump" method would be to reconsider your basic datatypes and store everything compactly, using a fixed-size structures.

struct FuntionMango
{
    int NumParams; // valid items in Param/Return arrays
    int NumReturns;

    ValType ParamTypes[MAX_PARAMS];
    ValType ReturnTypes[MAX_RETURNS];
};

struct MangoType
{
    int NumContent; // valid items in Content array
    // Fixed array instead of vector<FuntionMango>
    FuntionMango Content[MAX_FUNCTIONS];
};

struct Mango // all fields are just 'public'
{
    MangoType typeMan;
};

Then the "save" procedure would be

void saveMango(const char* filename, Mango* mango)
{
    FILE* OutFile = fopen(...);
    fwrite(mango, 1, sizeof(Mango), OutFile);
    fclose(OutFile);
}

and load just uses "fread" (of course, all error handling and file integrity checking is omitted)

void loadMango(const char* filename, Mango* mango)
{
    FILE* InFile = fopen(...);
    fread(mango, 1, sizeof(Mango), InFile);
    fclose(InFile);
}

To convert you Mango into a byte array, just use a reinterpret_cast or a C-style cast.

Unfortunately, this approach would fail if any of your structures either contains pointer fields or has non-trivial constructors/destructors.

[EDIT (on request)]

Conversion to a byte array (filling an std::vector<uint8_t>) can be done by using standard constructor of std::vector

Mango mango;
uint8_t* rawPointer = reinterpret_cast<uint8_t*>(&mango);
std::vector<uint8_t> byteArray(rawPointer, rawPointer + sizeof(Mango));

And vice versa, convert byte array to Mango

Mango otherMango;
uint8_t* rawPointer2 = reinterpret_cast<uint8_t*>(&otherMango);
memcpy(rawPointer2, byteArray.data(), sizeof(Mango));
like image 24
Viktor Latypov Avatar answered Sep 22 '25 05:09

Viktor Latypov