Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the C/C++ equivalence of java.io.Serializable?

What is the C/C++ equivalence of java.io.Serializable?

There're references to serialization libraries on:

  • Serialize Data Structures in C

And there are:

  • http://troydhanson.github.io/tpl/index.html
  • http://www.boost.org/doc/libs/1_41_0/libs/serialization/doc/index.html
  • https://developers.google.com/protocol-buffers/docs/cpptutorial#optimization-tips

But do such an equivalence even exists?

So if I have an abstract class as follows in Java, how would a serializable class in C/C++ look like?

import java.io.Serializable;

public interface SuperMan extends Serializable{

    /**
     * Count the number of abilities.
     * @return
     */
    public int countAbility();

    /**
     * Get the ability with index k.
     * @param k
     * @return
     */
    public long getAbility(int k);

    /**
     * Get the array of ability from his hand.
     * @param k
     * @return
     */
    public int[] getAbilityFromHand(int k);

    /**
     * Get the finger of the hand.
     * @param k
     * @return
     */
    public int[][] getAbilityFromFinger(int k);

    //check whether the finger with index k is removed.
    public boolean hasFingerRemoved(int k);

    /**
     * Remove the finger with index k.
     * @param k
     */
    public void removeFinger(int k);

}

Could any serializable C/C++ object just be inherited like in Java?

like image 350
alvas Avatar asked Jun 09 '16 08:06

alvas


People also ask

What is Java IO serializable?

Serialization is a mechanism of converting the state of an object into a byte stream. Serialization is done using ObjectOutputStream. Deserialization is the reverse process where the byte stream is used to recreate the actual Java object in memory. This mechanism is used to persist the object.

Which of the following statements about Java IO serializable interface is true?

Serializable interface are true ? (Choose three.) Objects from classes that use aggregation cannot be serialized. An object serialized on one JVM can be successfully deserialized on a different JVM. The values in fields with the volatile modifier will NOT survive serialization and deserialization.

How many methods serializable has in Java?

Serializable interface has two methods, readResolve() and writeReplace() , which are used to read and write object in database.

How many methods serializable has a 1 B 2 C 3 D 0?

4. How many methods Serializable has? Explanation: Serializable interface does not have any method.


2 Answers

There is not a single standard for this. In fact every library can implement it in different way. Here are some approaches which can be used:

  • class has to be derived from common base class and implement read() and write() virtual methods:

    class SuperMan : public BaseObj
    {
    public:
        virtual void read(Stream& stream);
        virtual void write(Stream& stream);
    };
    
  • class should implement special interface - in C++ this is done by deriving class from special abstract class. This is variaton of previous method:

    class Serializable
    {
    public:
        virtual Serializable() {}
        virtual void read(Stream& stream) = 0;
        virtual void write(Stream& stream) = 0;
    };
    
    class SuperMan : public Man, public Serializable
    {
    public:
        virtual void read(Stream& stream);
        virtual void write(Stream& stream);
    };
    
  • library may allow (or require) to register "serializers" for given type. They can be implemented by creating class from special base class or interface, and then registering them for given type:

    #define SUPERMAN_CLASS_ID 111
    
    class SuperMan
    {
    public:
        virtual int getClassId()
        {
            return SUPERMAN_CLASS_ID;
        }
    };
    
    class SuperManSerializer : public Serializer
    {
        virtual void* read(Stream& stream);
        virtual void write(Stream& stream, void* object);
    };
    
    int main()
    {
        register_class_serializer(SUPERMAN_CLASS_ID, new SuperManSerializer());
    }
    
  • serializers can be also implemented using functors, e.g. lambdas:

    int main
    {
        register_class_serializer(SUPERMAN_CLASS_ID,
                                  [](Stream&, const SuperMan&) {},
                                  [](Stream&) -> SuperMan {});
    }
    
  • instead of passing serializer object to some function, it may be enough to pass its type to special template function:

    int main
    {
        register_class_serializer<SuperManSerializer>();
    }
    
  • class should provide overloaded operators like '<<' and '>>'. First argument for them is some stream class, and second one is out class instance. Stream can be a std::stream, but this causes conflict with default use for these operators - converting to and from user-friendly text format. Because of this stream class is a dedicated one (it can wrap std::stream though), or library will support alternative method if << also has to be supported.

    class SuperMan
    {
    public:
        friend Stream& operator>>(const SuperMan&);
        friend Stream& operator<<(const SuperMan&);
    };
    
  • there should be specialization of some class template for our class type. This solution can be used together with << and >> operators - library first will try to use this template, and revert to operators if it will not be specialized (this can be implemented as default template version, or using SFINAE)

    // default implementation
    template<class T>
    class Serializable
    {
    public:
        void read(Stream& stream, const T& val)
        {
            stream >> val;
        }
        void write(Stream& stream, const T& val)
        {
            stream << val;
        }
    };
    
    // specialization for given class
    template<>
    class Serializable<SuperMan>
    {
        void read(Stream& stream, const SuperMan& val);
        void write(Stream& stream, const SuperMan& val);
    }
    
  • instead of class template library may also use C-style interface with global overloaded functions:

    template<class T>
    void read(Stream& stream, const T& val);
    template<class T>
    void write(Stream& stream, const T& val);
    
    template<>
    void read(Stream& stream, const SuperMan& val);
    template<>
    void write(Stream& stream, const SuperMan& val);
    

C++ language is flexible, so above list is for sure not complete. I am convinced it would be possible to invent another solutions.

like image 27
Daniel Frużyński Avatar answered Nov 03 '22 00:11

Daniel Frużyński


Luckily... C++ does not impose a default mechanism for serialization of a class hierarchy. (I wouldn't mind it supplying an optional mechanism supplied by a special base type in the standard library or something, but overall this could put limits on existing ABIs)

YES Serialization is incredibly important and powerful in modern software engineering. I use it any time I need to translate a class hierarchy to and from some form of runtime consumable data. The mechanism I always choose is based on some form of reflection. More on this below.

You may also want to look here for an idea of the complexities to consider and if you really wanted to verify against the standard you could purchase a copy here. It looks like the working draft for the next standard is on github.

Application specific systems

C++/C allow the author of the application the freedom to select the mechanics behind many of the technologies people take for granted with newer and often higher level languages. Reflection (RTTI), Exceptions, Resource/Memory Management (Garbage collection, RAII, etc.). These systems can all potentially impact the overall quality of a particular product.

I have worked on everything from real time games, embedded devices, mobile apps, to web applications and the overall goals of the particular project vary between them all.

Often for real time high performance games you will explicitly disable RTTI (it isn't very useful in C++ anyway to be honest) and possibly even Exceptions (Many people don't desire the overhead produced here either and if you were really crazy you could implement your own form from long jumps and such. For me Exceptions create an invisible interface that often creates bugs people wouldn't even expect to be possible, so I often avoid them anyway in favor of more explicit logic. ).

Garbage collection isn't included in C++ by default either and in real time games this is a blessing. Sure you can have incremental GC and other optimized approaches which I have seen many games use (often times it is a modification of an existing GC like that used in Mono for C#). Many games use pooling and often for C++ RAII driven by smart pointers. It isn't unusual to have different systems with different patterns of memory usage either which can be optimized in different ways. The point is some applications care more then others about the nitty gritty details.

General idea of automatic serialization of type hierarchy

The general idea of an automatic serialization system of type hierarchies is to use a reflection system that can query type information at runtime from a generic interface. My solution below relies on building that generic interface by extending upon some base type interfaces with the help of the macros. In the end you basically get a dynamic vtable of sorts that you can iterate by index or query by string names of members/types.

I also use a base reflection reader/writer type that exposes some iostream interfaces to allow derived formatters to override. I currently have a BinaryObjectIO, JSONObjectIO, and ASTObjectIO but it is trivial to add others. The point of this is to remove the responsibly of serializing a particular data format from the hierarchy and put it into the serializer.

Reflection at the language level

In many situations the application knows what data it would like to serialize and there is no reason to build it into every object in the language. Many modern languages include RTTI even in the basic types of the system (if they are type based common intrinsics would be int, float, double, etc.). This requires extra data to be stored for everything in the system regardless of the usage by the application. I'm sure many modern compilers can at times optimize away some with tree shaking and such, but you can't guarantee that either.

A Declarative approach

The methods already mentioned are all valid use cases, although they lack some flexibility by having the hierarchy handle the actual serialization task. This can also bloat your code with boilerplate stream manipulation on the hierarchy.

I personally prefer a more declarative approach via reflection. What I have done in the past and continue to do in some situations is create a base Reflectable type in my system. I end up using template metaprogramming to help with some boilerplate logic as well as the preprocessor for string concatenation macros. The end result is a base type that I derive from, a reflectable macro declaration to expose the interface and a reflectable macro definition to implement the guts (tasks like adding the registered member to the type's lookup table.).

So I normally end up with something that looks like this in the h:

class ASTNode : public Reflectable 
{

...

public:
    DECLARE_CLASS

    DECLARE_MEMBER(mLine,int)
    DECLARE_MEMBER(mColumn,int)

...

};

Then something like this in the cpp:

BEGIN_REGISTER_CLASS(ASTNode,Reflectable);
REGISTER_MEMBER(ASTNode,mLine);
REGISTER_MEMBER(ASTNode,mColumn);
END_REGISTER_CLASS(ASTNode);

ASTNode::ASTNode() 
: mLine( 0 )
, mColumn( 0 )
{
}

I can then use the reflection interface directly with some methods such as:

int id = myreflectedObject.Get<int>("mID");
myreflectedObject.Set( "mID", 6 );

But much more commonly I just iterate some "Traits" data that I have exposed with another interface:

ReflectionInfo::RefTraitsList::const_iterator it = info->getReflectionTraits().begin();

Currently the traits object looks something like this:

class ReflectionTraits
    {
    public:
        ReflectionTraits( const uint8_t& type, const uint8_t& arrayType, const char* name, const ptrType_t& offset );

        std::string getName() const{ return mName; }
        ptrType_t getOffset() const{ return mOffset; }
        uint8_t getType() const{ return mType; }
        uint8_t getArrayType() const{ return mArrayType; }

    private:    
        std::string     mName;
        ptrType_t       mOffset;
        uint8_t         mType;
        uint8_t         mArrayType; // if mType == TYPE_ARRAY this will give the type of the underlying data in the array
    };

I have actually come up with improvements to my macros that allow me to simplify this a bit... but those are taken from an actual project I'm working on currently. I'm developing a programming language using Flex, Bison, and LLVM that compiles to C ABI and webassembly. I'm hoping to open source it soon enough, so if you are interested in the details let me know.

The thing to note here is that "Traits" information is metadata that is accessible at runtime and describes the member and is often much larger for general language level reflection. The information I have included here was all I needed for my reflectable types.

The other important aspect to keep in mind when serializing any data is version information. The above approach will deserialize data just fine until you start changing the internal data structure. You could, however, include a post and possibly pre data serialization hook mechanism with your serialization system so you can fix up data to comply with newer versions of types. I have done this a few times with setups like this and it works really well.

One final note about this technique is that you are explicitly controlling what is serialized here. You can pick and choose the data you want to serialize and the data that may just be keeping track of some transient object state.

C++ Lax guarantees

One thing to note... Since C++ is VERY lax about what data actually looks like. You often have to make some platform specific choices (this is probably one of the main reasons a standard system isn't provided). You can actually do a great deal at compile time with Template metaprogramming, but sometimes it is easier to just assume your char to be 8 bits in length. Yes even this simple assumption isn't 100% universal in C++, luckily in most situations it is.

The approach I use also does some non-standard casting of NULL pointers to determine memory layout (again for my purposes this is the nature of the beast). The following is an example snippet from one of the macro implementations to calculate the member offset in the type where CLASS is provided by the macro.

(ptrType_t)&reinterpret_cast<ptrType_t&>((reinterpret_cast<CLASS*>(0))->member)

A general warning about reflection

The biggest issue with reflection is how powerful it can be. You can quickly turn an easily maintainable codebase into a huge mess with too much inconsistent usage of reflection.

I personally reserve reflection for lower level systems (primarily serialization) and avoid using it for runtime type checking for business logic. Dynamic dispatching with language constructs such as virtual functions should be preferred to reflection type check conditional jumps.

Issues are even harder to track down if the language has inherit all or nothing support for reflection as well. In C# for example you cannot guarantee, given a random codebase, that a function isn't being used simply by allowing the compiler to alert you of any usage. Not only can you invoke the method via a string from the codebase or say from a network packet... you also could break the ABI compatibility of some other unrelated assembly that reflects on the target assembly. So again use reflection consistently and sparingly.

Conclusion

There is currently no standard equivalent to the common paradigm of a serializable class hierarchy in C++, but it can be added much like any other system you see in newer languages. After all everything eventually translates down to simplistic machine code that can be represented by the binary state of the incredible array of transistors included in your CPU die.

I'm not saying that everyone should roll their own here by any means. It is complicated and error prone work. I just really liked the idea and have been interested in this sort of thing for a while now anyways. I'm sure there are some standard fallbacks people use for this sort of work. The first place to look for C++ would be boost as you mentioned above.

If you do a search for "C++ Reflection" you will see several examples of how others achieve a similar result.

A quick search pulled up this as one example.

like image 195
Matthew Sanders Avatar answered Nov 03 '22 01:11

Matthew Sanders