I've been using minidom to parse XML for years. Now I've suddenly learned about Element Tree. My question which is better for parsing? That is:
Why do we have two interfaces?
Thanks.
xml. dom. minidom is a minimal implementation of the Document Object Model interface, with an API similar to that in other languages. It is intended to be simpler than the full DOM and also significantly smaller. Users who are not already proficient with the DOM should consider using the xml.
Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.
If parsing speed is a key factor for you, consider using cElementTree or lxml.
DOM and Sax interfaces for XML parsing are the classic ways to work with XML. Python had to provide those interfaces because they are well-known and standard.
The ElementTree package was intended to provide a more Pythonic interface. It is all about making things easier for the programmer.
Depending on your build, each of those has an underlying C implementation that makes them run fast.
None of the above tools is being deprecated. They each have their merits (Sax doesn't need to read the whole input into memory, for example).
There is also third-party module called lxml which is also a popular choice (full featured and fast).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With