Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices for custom file structures

Tags:

file

I'm currently quite curious in how other programmers organise data into files. Can anyone recommend any good articles or books on best practices for creating file structures?

For example, if you've created your own piece of software for whatever purpose, do you leave the saved data as plain text, serialize it, encode to xml, and why do you do this?

Are there any secrets I've missed?

like image 304
Andy Avatar asked Mar 01 '09 22:03

Andy


3 Answers

Generally, go with the simplest thing that can possibly work, at least at first. Consider, eg, UNIX, where most of the configuration files are nothing but whitepace-delimited fields, or fields delimited with another character (like /etc/passwd, which uses ":" delimiters because the GCOS field can contain blanks.)

If your data needs a lot more structure, then ask yourself "what tools can I use easily?" Python and Ruby have JSON and YAML, for example.

XML is basically useful if you have lots of XML-based stuff already, OR you expect to transform the XML to a displayable form in a browser. Otherwise, it's usually very heavyweight (code size, complexity) for what you get from it.

like image 85
Charlie Martin Avatar answered Nov 18 '22 00:11

Charlie Martin


No matter which format you choose remember to store some kind of version number inside (I'm pretty sure that you'll have to introduce some changes).

Format depends heavily on the application and amount of data. For some applications XML is appropriate, for other applications fixed size structs stored in a binary file are good.

I use many different formats, depending on situation, for example:

  • plain text file (delimited) for storing datasets for Matlab and R analysis
  • binary files - for storing fixed size structures (with dynamic sized the random access gets difficult without maintaining a separate array of offsets for the elements). One the positives you've got performance and space efficiency (why do most of databases store data in binary format?), but it is not very good for human beings to work with. Remember of the endianess.
  • XML - usually for configuration data, or data that I want to give to other users applications (along with XSD). The other side can write nice XSLT transformation or consume the data in other manner (of course they could do the same with plain text or binary data given the format description)
like image 5
Anonymous Avatar answered Nov 18 '22 00:11

Anonymous


Unless you have unique requirements, use something for which there is already a mature library, so you can avoid writing your own parsing code. That means XML/JSON, etc, like people have said.

One other nice one is Google's protocol buffers (http://code.google.com/p/protobuf). There you write a common message definition and the protocol buffer compiler generates objects for filling out, serializing, and deserializing the data for you. Typically the format is binary, but you can use their TextFormat class to write JSON-like plain text too. The nice thing about protobufs is that the versioning code is generated for you. In version 2 of your file format, all you have to do is add fields to the .proto definition file. The new version can read the old file format, and just leaves the new fields blank. It's not exactly what protobufs were designed for, but they make an easy, efficient binary file format for custom messages, and the code is generated for you.

Also see Facebook's Thrift, now in the Apache incubator.

like image 2
Kevin Weil Avatar answered Nov 17 '22 22:11

Kevin Weil