XML has a lot of benefits. It's both machine and human readable, it has a standardized format and it is remarkably versatile.
It also has some disadvantages. It's verbose and not a very efficient means of transferring large amounts of data.
One of the most useful aspects of XML is the schema language. Using a schema you can generate source code in any modern programming language to read an xml format without the tedious process of hand coding that usually accompanies most other file formats.
This got me thinking about whether a schema language for arbitrary binary file formats exists and if not, would it be a worth while endeavor?
Just in case I've been unclear. I'm asking about a language whose purpose is to define byte offsets, field and record lengths, delimiters, etc. that could be parsed to generate code that would read a file format that conformed to that specification.
I doubt I'm the first to suggest such an idea so if you know of any projects or working groups that have or are currently pursuing this area I'd be grateful.
A schema is a formal definition of the syntax of an XML-based language, that is, it defines a family of XML documents. A schema language is a formal language for expressing schemas. There exists a variety of schema languages, as we shall see later.
Binary Files These files store multiple types of data like image, video, and audio in the same file.
A binary format is a format in which file information is stored in the form of ones and zeros, or in some other binary (two-state) sequence. This type of format is often used for executable files and numeric information in computer programming and memory.
I know this is an old question, but in the last few years I feel that Kaitai Struct has emerged as one of the best arbitrary binary schema description options, the bonus that it generates parsing code is a huge bonus.
https://kaitai.io/
"develop parsers for binary structures"
xtype is a new general-purpose binary data language I developed that also covers the typical usage of XML: https://github.com/bitagoras/xtype/ A similar format that should be mentioned here is UBJSON, an efficient binary format for JSON like structures: https://github.com/ubjson/universal-binary-json
Yes, several people have tried to do this.
One such attempt is Binary Format Description. Another is Data Format Description Language. I'm not sure how practical either one really is, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With