Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML vs comma delimited text files

Tags:

text

xml

csv

Ok, I've read a couple books on XML and wrote programs to spit it out and what not. But here's the question. Both a comma delimited file and a XML file are "human readable." But in general, the comma delimited file is much easier on my eyes than a XML file; the tags typically take up as much if not more space than the data. This just seems to obscure what I'm reading and the format can take a page to contain the same information that you can contain on a single line of text in a comma delimited file. And a comma delimited file is significantly less complex to parse. So the real question is why XML? Just because all the cool kids are doing it?

like image 252
NoMoreZealots Avatar asked Jul 17 '09 00:07

NoMoreZealots


2 Answers

Advantages

A number of advantages XML has over CSV:

  • Hierarchical data organization
  • Automatic data validation (XML Schemas or DTDs)
  • Easily convert formats (using XSL)
  • Easy to identify relational structure
  • Can be used in combination with XML-RPC
  • Suitable for object persistence (marshalling)
  • Simplifies business-to-business communications
  • Helpful related technologies (XPath, DOM)
  • Tight integration with modern Web browsers
  • Extract, Transform, and Load (ETL) tools
  • Backwards file format compatibility (version attribute)
  • Digital signatures

It completely depends on the problem domain and what you are trying to solve.

Example

The last item is something that many people miss when writing web pages. Consider the situation where you have a large data store of songs. Songs have artists, albums, beats per minute, and so forth. You could export the data to XML, write a simple stylesheet to render the XML as XHTML, then point the browser at the XML page. The browser will render the XML as a web page.

You cannot do that with CSV.

Disadvantages

Joel Spolsky has a great article on why XML is a poor choice as a complex data store: it is slow. (Unlike a database, which can retrieve previous or next records with a single CPU instruction, traversing records in an XML document is much slower.) Arguably, this could be considered an optimization problem, resolved by waiting 18 months. Thus:

  • Slower to parse than other formats
  • Syntactical redundancy can detract from readability
  • Document bloat could affect storage costs
  • Cannot easily model overlapping (non-hierarchical) data structures
  • Poorly designed XML file formats are not uncommon (in my experience; citation needed)

Related Question

See also: Why Should I Use A Human Readable File Format.

like image 123
Dave Jarvis Avatar answered Oct 04 '22 15:10

Dave Jarvis


These aren't the only two options, you can also use JSON or YAML which are much lighter weight than xml.

In general, if you have simple tabular data with out many special characters, CSV isn't a bad choice. For structured data, consider using one of the other 3.

like image 36
Dana the Sane Avatar answered Oct 04 '22 15:10

Dana the Sane