Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Versioning friendly, extendible binary file format

In the project I'm currently working on there is a need to save a sizable data structure to disk (edit: think dozens of MB's). Being an optimist, I thought that there must be a standard solution for such a problem; however, up to now I haven't found a solution that satisfies the following requirements:

  1. .NET 2.0 support, preferably with a FOSS implementation
  2. Version friendly (this should be interpreted as: reading an old version of the format should be relatively simple if the changes in the underlying data structure are simple, say adding/dropping fields)
  3. Ability to do some form of random access where part of the data can be extended after initial creation, without the need to deserialize the collection created up to this point in time (think of this as extending intermediate results)
  4. Space and time efficient (XML has been excluded as option given this requirement)

Options considered so far:

  • XmlSerializer: was turned down since xml serialization does not meet requirement 3 and 4.
  • SerializableAttribute: does not support requirements 2 and 3.
  • Protocol Buffers: was turned down by verdict of the documentation about Large Data Sets - since this comment suggested adding another layer on top, this would call for additional complexity which I wish to have handled by the file format itself.
  • HDF5,EXI: do not seem to have .net implementations
  • SQLite/SQL Server Compact edition: the data structure at hand would result in a pretty complex table structure that seems too heavyweight for the intended use
  • BSON: does not appear to support requirement 3.
  • Fast Infoset: only seems to have paid .NET implementations.

Any recommendations or pointers are greatly appreciated. Furthermore if you believe any of the information above is not true, please provide pointers/examples to prove me wrong.

like image 813
Bas Bossink Avatar asked Mar 29 '10 20:03

Bas Bossink


1 Answers

Have you considered using SQL Server Compact Edition?

  1. It has plenty of .NET support
  2. The versioning of the schema and the ability for new versions of your application handling old schemas would be entirely in your control. Versioning of SQL Server Compact should be somewhat seemless beyond your application using features in a newer version that did not exist in the older version.
  3. You have the most of the SQL syntax available to you for querying.
  4. Obviously from the name, this version of SQL Server was designed for embedded systems which can include applications that want to avoid installation of SQL Express or the full blown version of SQL Server.

Now, this would have the same issues as SQLite in that the data structure, from what you have told us, could get complicated, but that will be true even if you roll you own binary format.

Btw, it occurs to me that you haven't clarified what exactly is meant by "sizeable". If "sizeable" means close to or more than 4 GB, obviously SQL Compact will not work nor will a host of other database file formats.

EDIT I notice that you have added SQL Compact Edition to your list of "too heavyweight" list after my post. SQL Compact requires only 5MB of RAM and 2MB of disk storage depending on the size of the database. So, the problem cannot be that is heavyweight. Now, as to the second point of claiming the data structure would be pretty complicated. If that is true, I suspect it will be true of any relational database product and rolling your own binary format will be even more complicated. Given that, you might look at non-relational database products such as mongodb.

like image 113
Thomas Avatar answered Sep 22 '22 17:09

Thomas