Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serialization for document storage

I write a desktop application that can open / edit / save documents.

Those documents are described by several objects of different types that store references to each other. Of course there is a Document class that that serves as the root of this data structure.

The question is how to save this document model into a file.

What I need:

  • Support for recursive structures.
  • It must be able to open files even if they were produced from slightly different classes. My users don't want to recreate every document after every release just because I added a field somewhere.
  • It must deal with classes that are not known at compile time (for plug-in support).

What I tired so far:

  • XmlSerializer -> Fails the first and last criteria.
  • BinarySerializer -> Fails the second criteria.

  • DataContractSerializer: Similar to XmlSerializer but with support for cyclic (recursive) references. Also it was designed with (forward/backward) compatibility in mind: Data Contract Versioning. [edit]

  • NetDataContractSerializer: While the DataContractSerializer still requires to know all types in advance (i.e. it can't work very well with inheritance), NetDataContractSerializer stores type information in the output. Other than that the two seem to be equivalent. [edit]

  • protobuf-net: Didn't have time to experiment with it yet, but it seems similar in function to DataContractSerializer, but using a binary format. [edit]

Handling of unknown types [edit]

There seem two be two philosophies about what to do when the static and dynamic type differ (if you have a field of type object but a, lets say, Person-object in it). Basically the dynamic type must somehow get stored in the file.

  • Use different XML tags for different dynamic types. But since the XML tag to be used for a particular class might not be equal to the class name, its only possible to go this route if the deserializer knows all possible types in advance (so that he can scan them for attributes).

  • Store the CLR type (class name, assembly name & version) during serialization. Use this info during deserialization to instantiate the right class. The types must not be known prior to deserialization.

The second one is simpler to use, but the resulting file will be CLR dependent (and less sensitive to code modifications). Thats probably why XmlSerializer and DataContractSerializer choose the first way. NetDataContractSerializer is not recomended because its using the second approch (So does BinarySerializer by the way).

Any ideas?

like image 374
Stefan Avatar asked Feb 07 '10 13:02

Stefan


People also ask

What is serialization of documents?

The process of converting a document from an internal in-memory representation to an external data store is termed serialization. The reverse process of reading a data store and recreating the original in-memory instance is termed deserialization.

What can serialization be used for?

Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.

What does serialization refer to in data storing?

Serialization is the process of converting a data object—a combination of code and data represented within a region of data storage—into a series of bytes that saves the state of the object in an easily transmittable form.

When should we use serialization?

Serialization in Java allows us to convert an Object to stream that we can send over the network or save it as file or store in DB for later usage. Deserialization is the process of converting Object stream to actual Java Object to be used in our program.


2 Answers

The one you haven't tried is DataContractSerializer. There is a constructor that takes a parameter bool preserveObjectReferences that should handle the first criteria.

like image 101
DW. Avatar answered Oct 25 '22 21:10

DW.


The WCF data contract serializer is probably closest to your needs, although not perfect.

There is only limited support for backwards compatibility (i.e. whether old versions of the program can read documents generated with a newer version). New fields are supported (via IExtensibleDataObject), but new classes or new enum values not.

like image 28
oefe Avatar answered Oct 25 '22 19:10

oefe