Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XML Parsing: Element Tree (etree) vs. minidom [duplicate]

I've been using minidom to parse XML for years. Now I've suddenly learned about Element Tree. My question which is better for parsing? That is:

  • Which is faster?
  • Which uses less memory?
  • Do either have any O(n^2) dependencies I should worry about?
  • Is one being depreciated in favor of another?

Why do we have two interfaces?

Thanks.

like image 695
vy32 Avatar asked Nov 05 '11 18:11

vy32


People also ask

What is XML Minidom?

xml. dom. minidom is a minimal implementation of the Document Object Model interface, with an API similar to that in other languages. It is intended to be simpler than the full DOM and also significantly smaller. Users who are not already proficient with the DOM should consider using the xml.

What does Etree parse do?

Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.

Which XML parser is best in Python?

If parsing speed is a key factor for you, consider using cElementTree or lxml.


Video Answer


1 Answers

DOM and Sax interfaces for XML parsing are the classic ways to work with XML. Python had to provide those interfaces because they are well-known and standard.

The ElementTree package was intended to provide a more Pythonic interface. It is all about making things easier for the programmer.

Depending on your build, each of those has an underlying C implementation that makes them run fast.

None of the above tools is being deprecated. They each have their merits (Sax doesn't need to read the whole input into memory, for example).

There is also third-party module called lxml which is also a popular choice (full featured and fast).

like image 52
Raymond Hettinger Avatar answered Sep 29 '22 14:09

Raymond Hettinger