Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which XML library for what purposes?

Tags:

python

xml

A search for "python" and "xml" returns a variety of libraries for combining the two.

This list probably faulty:

  • xml.dom
  • xml.etree
  • xml.sax
  • xml.parsers.expat
  • PyXML
  • beautifulsoup?
  • HTMLParser
  • htmllib
  • sgmllib

Be nice if someone can offer a quick summary of when to use which, and why.

like image 675
John Mee Avatar asked Mar 12 '10 04:03

John Mee


People also ask

How do libraries use XML?

For years, libraries have been quietly using XML to perform functions such as improving access to archival materials, simplifying interlibrary loan processing, and enhancing digital collections, but increased reliance on the Internet for delivering information resources has brought XML into the mainstream, where its ...

What is XML give purpose and list its features?

XML stores data in plain text format. This provides a software- and hardware-independent way of storing, transporting, and sharing data. XML also makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.

What are XML libraries?

XML parser is a software library or a package that provides interface for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML documents. Modern day browsers have built-in XML parsers. The goal of a parser is to transform XML into a readable code.

For what purpose XML is used?

General applications: XML provides a standard method to access information, making it easier for applications and devices of all kinds to use, store, transmit, and display data.


2 Answers

The DOM/SAX divide is a basic one. It applies not just to python since DOM and SAX are cross-language.

DOM: read the whole document into memory and manipulate it. Good for:

  • complex relationships across tags in the markup
  • small intricate XML documents
  • Cautions:
    • Easy to use excessive memory

SAX: parse the document while you read it. Good for:

  • Long documents or open ended streams
  • places where memory is a constraint
  • Cautions:
    • You'll need to code a stateful parser, which can be tricky

beautifulsoup:

Great for HTML or not-quite-well-formed markup. Easy to use and fast. Good for screen scraping, etc. It can work with markup where the XML based ones would just through an error saying the markup is incorrect.

Most of the rest I haven't used, but I don't think there's hard and fast rules about when to use which. Just your standard considerations: who is going to maintain the code, which APIs do you find most easy to use, how well do they work, etc.

In general, for basic needs, it's nice to use the standard library modules since they are "standard" and thus available and well known. However, if you need to dig deep into something, almost always there are newer nonstandard modules with superior functionality outside of the standard library.

like image 68
Peter Lyons Avatar answered Sep 30 '22 18:09

Peter Lyons


I find xml.etree essentially sufficient for everything, except for BeautifulSoup if I ever need to parse broken XML (not a common problem, differently from broken HTML, which BeautifulSoup also helps with and is everywhere): it has reasonable support for reading entire XML docs in memory, navigating them, creating them, incrementally-parsing large docs. lxml supports the same interface, and is generally faster -- useful to push performance when you can afford to install third party Python extensions (e.g. on App Engine you can't -- but xml.etree is still there, so you can run exactly the same code). lxml also has more features, and offers BeautifulSoup too.

The other libs you mention mimic APIs designed for very different languages, and in general I see no reason to contort Python into those gyrations. If you have very specific needs such as support for xslt, various kinds of validations, etc, it may be worth looking around for other libraries yet, but I haven't had such needs in a long time so I'm not current wrt the offerings for them.

like image 22
Alex Martelli Avatar answered Sep 30 '22 16:09

Alex Martelli