What problems was XML invented to solve? From what I can tell, it seems like it specifies a uniform syntax for things that may have vastly different semantics. Unlike, for example, an HTML file, a Java source file, or a .docx document, one cannot write a program to extract any kind of high-level meaning from an XML file without lots of additional information. What is the value of having the syntax rigidly specified by some standards committee even when the semantic meaning is completely unspecified? What advantages does XML have over just rolling your own ad-hoc format that does exactly what you need and nothing more? In short, what does XML accomplish and why is it so widely used?
XML forces your data to be well-structured, so that a program which does not understand the semantics of your data will still be able to understand its syntax. This allows things like XSLT, which will transform one well-formed XML document into another. It means that you can manipulate data without having to interpret it. You can see the document is well-formed and valid according to its DTD without needing to understand the contents.
This was a huge step forward for data storage, interoperability, and machine-readability in general.
I personally find XML to be useful because I find writing parsers to be a pain. If you invent your own data format that is what you wind up spending a lot of your time writing parsing code - checking for correct input in what could be a lot of user data. Then after you get all the input and validity checking code completed for your parser, you then have the joy of developing documentation for your file format for anyone else who wants to use it, plus the further joy of finding bugs in your input validation code for your parser after they start sending data your way.
With XML the parsing mechanics are well defined, and with XML schema or DTDs you can specify the formats you are willing to accept. XML parsers are available for almost every major programming language, so you the amount of code you have to write, maintain, and document is greatly reduced.
xml lets you be non-standard in a standard way :). It's ugly, it's verbose, it takes up a lot of space and it's absolutely invaluable for interoperability. Basically, xml is nice because it gives you a standard way of describing your data so that a single type of parser can handle data from disparate sources.
To use a more concrete example, I used to work in the semiconductor tool industry in the days before xml. Each tool used a recipe to describe how to process a particular wafer. Every one of those tools used a different format for their recipes. Now, pity the poor person (me!) who had to take several of those tools and integrate them into a single processing system. I had to write a different parser for each recipe type, convert recipes from a common store into the format appropriate for a particular tool, it was just a nightmare. If xml had been available, all those recipes could have been defined via xml and any conversions or transformations handled with simple xlst scripts. It would have saved me literally months of development effort just for that portion of the integration code.
Ad hoc solutions work fine within the confines of your own system, but when you need the ability to communicate with 1...N other systems, it's a good foundation that all parties can rely on to work at a minimum in a certain way. Yes, the data has no semantic meaning, but you're assured that the TRANSFER and CONVERSION of data will still be successful. There's many more reasons, but that's one of the most important I've always thought.
This is a very primitive example but think of when systems used to communicate with flatfile data. You could have had a string that other parties had built communication around such as AAABBBCCCDDD. Other systems knew that they would get AAA "data" in the first 3 characters etc... Now someone changes something on your side and accidentally starts sending BBB AAA CCC DDD. Boom, everything is broken.
With XML you could have both:
<xml>
<a>AAA</a>
<b>BBB</b>
<c>CCC</c>
<d>DDD</d>
</xml>
AND
<xml>
<b>BBB</b>
<a>AAA</a>
<c>CCC</c>
<d>DDD</d>
</xml>
without breaking someone elses system.
The answer is in your own question. "From what I can tell, it seems like it specifies a uniform syntax for things that may have vastly different semantics." Having a uniform syntax solves part of the problem for things that have vastly different semantics, and it's not a trivial problem in the slightest.
Similarly, text-encoding is used in markup (including XML), computer programs, writing human-readable documents and many more tasks with vastly different semantics. Would you like to reinvent Unicode every single time? Would you even know enough about all the issues to have a chance of doing so (or even a chance of re-inventing a passable ASCII?, ASCII only seems simple these days because so many of the complicated features of its control codes are no longer used, old school ASCII uses are often way more complicated than Unicode).
Numbers are used all over the place in computing, and we still have four different internal syntaxes in use (two endian styles, two complement styles) though the details are generally hidden these days.
As well as doing one chunk of the work of the creator of the format for them, and demonstrating one chunk of the work for the producer or consumer is one they are already familiar with (and hence may already have tools for), it completely eliminates one chunk of the work for a producer-consumer who is reading in one format and writing in another.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With