I'm currently quite curious in how other programmers organise data into files. Can anyone recommend any good articles or books on best practices for creating file structures?
For example, if you've created your own piece of software for whatever purpose, do you leave the saved data as plain text, serialize it, encode to xml, and why do you do this?
Are there any secrets I've missed?
Generally, go with the simplest thing that can possibly work, at least at first. Consider, eg, UNIX, where most of the configuration files are nothing but whitepace-delimited fields, or fields delimited with another character (like /etc/passwd, which uses ":" delimiters because the GCOS field can contain blanks.)
If your data needs a lot more structure, then ask yourself "what tools can I use easily?" Python and Ruby have JSON and YAML, for example.
XML is basically useful if you have lots of XML-based stuff already, OR you expect to transform the XML to a displayable form in a browser. Otherwise, it's usually very heavyweight (code size, complexity) for what you get from it.
No matter which format you choose remember to store some kind of version number inside (I'm pretty sure that you'll have to introduce some changes).
Format depends heavily on the application and amount of data. For some applications XML is appropriate, for other applications fixed size structs stored in a binary file are good.
I use many different formats, depending on situation, for example:
Unless you have unique requirements, use something for which there is already a mature library, so you can avoid writing your own parsing code. That means XML/JSON, etc, like people have said.
One other nice one is Google's protocol buffers (http://code.google.com/p/protobuf). There you write a common message definition and the protocol buffer compiler generates objects for filling out, serializing, and deserializing the data for you. Typically the format is binary, but you can use their TextFormat class to write JSON-like plain text too. The nice thing about protobufs is that the versioning code is generated for you. In version 2 of your file format, all you have to do is add fields to the .proto definition file. The new version can read the old file format, and just leaves the new fields blank. It's not exactly what protobufs were designed for, but they make an easy, efficient binary file format for custom messages, and the code is generated for you.
Also see Facebook's Thrift, now in the Apache incubator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With