Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Best" Input File Formats for C++? [closed]

I am starting work on a new piece of software that will end up needing some robust and expandable file IO. There are a lot of formats out there. XML, JSON, INI, etc. However, there are always plusses and minuses so I thought I would ask for some community input.

Here are some rough requirements:

  1. The format is a "standard"...I don't want to reinvent the wheel if I don't have to. It doesn't have to be a formal IEEE standard, but something you could Google and get some information on as a new user, may have some support tools (editors) beyond vi. (Though the software users will generally be computer savvy and happy to use vi.)
  2. Easily integrates with C++. I don't want to have to pull along a 100mb library and three different compilers to get it up and running.
  3. Supports tabular input (2d, n-dimensional)
  4. Supports POD types
  5. Can expand as more inputs are required, binds well to variables, etc.
  6. Parsing speed is not terribly important
  7. Ideally, as easy to write (reflect) as it is to read
  8. Works well on Windows and Linux
  9. Supports compositing (one file referencing another file to read, and so on.)
  10. Human Readable

In a perfect world, I would use a header-only library or some clean STL implementation, but I'm fine with leveraging Boost or some small external library if it works well.

So, what are your thoughts on various formats? Drawbacks? Advantages?

Edit

Options to consider? Anything else to add?

  • XML
  • YAML
  • SQLite
  • Google Protocol Buffers
  • Boost Serialization
  • INI
  • JSON
like image 600
DigitalInBlue Avatar asked Feb 05 '13 03:02

DigitalInBlue


1 Answers

There is one excellent format that meets all your criteria:

SQLite!

Please read article about using SQLite as an application file format. Also, please watch Google Tech Talk by D. Richard Hipp (SQLite author) about this very topic.

Now, lets see how SQLite meets your requirements:

The format is a "standard"

SQLite has become format of choice for most mobile environments, and for many desktop apps (Firefox, Thunderbird, Google Chrome, Adobe Reader, you name it).

Easily integrates with C++

SQLite has standard C interface, which is only one source file and one header file. There are C++ wrappers too.

Supports tabular input (2d, n-dimensional)

SQLite table is as tabular as you could possibly imagine. To represent say 3-dimensional data, create table with columns x,y,z,value and store your data as a set of rows like this:

x1,y1,z1,value1
x2,y2,z2,value2
...

Supports POD types

I assume by POD you meant Plain Old Data, or BLOB. SQLite lets you store BLOB fields as is.

Can expand as more inputs are required, binds well to variables

This is where it really shines.

Parsing speed is not terribly important

But SQLite speed is superb. In fact, parsing is basically transparent.

Ideally, as easy to write (reflect) as it is to read

Just use INSERT to write and SELECT to read - what could be easier?

Works well on Windows and Linux

You bet, and all other platforms as well.

Supports compositing (one file referencing another file to read)

You can ATTACH one database to another.

Human Readable

Not in binary, but there are many excellent SQLite browsers/editors out there. I like SQLite Expert Personal on Windows and sqliteman on Linux. There is also SQLite editor plugin for Firefox.


There are other advantages that SQLite gives you for free:

  • Data is indexable which makes it very fast to search. You just cannot do this using XML, JSON or any other text-only formats.

  • Data can be edited partially, even when amount of data is very large. You do not have to rewrite few gigabytes just to edit one value.

  • SQLite is fully transactional: it guarantees that your data is consistent at all times. Even if your application (or whole computer) crashes, your data will be automatically restored to last known consistent state on next first attempt to connect to the database.

  • SQLite stores your data verbatim: you do not need to worry about escaping junk characters in your data (including zero bytes embedded in your strings) - simply always use prepared statements, that's all it takes to make it transparent. This can be big and annoying problem when dealing with text data formats, XML in particular.

  • SQLite stores all strings in Unicode: UTF-8 (default) or UTF-16. In other words, you do not need to worry about text encodings or international support for your data format.

  • SQLite allows you to process data in small chunks (row by row in fact), thus it works well in low memory conditions. This can be a problem for any text based formats, because often they need to load all text into memory to parse it. Granted, there are few efficient stream-based XML parsers out there, but in general any XML parser will be quite memory greedy compared to SQLite.

like image 162
mvp Avatar answered Sep 20 '22 14:09

mvp