Which XML library for what purposes?

2 Answers

The DOM/SAX divide is a basic one. It applies not just to python since DOM and SAX are cross-language.

DOM: read the whole document into memory and manipulate it. Good for:

complex relationships across tags in the markup
small intricate XML documents
Cautions:
- Easy to use excessive memory

SAX: parse the document while you read it. Good for:

Long documents or open ended streams
places where memory is a constraint
Cautions:
- You'll need to code a stateful parser, which can be tricky

beautifulsoup:

Great for HTML or not-quite-well-formed markup. Easy to use and fast. Good for screen scraping, etc. It can work with markup where the XML based ones would just through an error saying the markup is incorrect.

Most of the rest I haven't used, but I don't think there's hard and fast rules about when to use which. Just your standard considerations: who is going to maintain the code, which APIs do you find most easy to use, how well do they work, etc.

In general, for basic needs, it's nice to use the standard library modules since they are "standard" and thus available and well known. However, if you need to dig deep into something, almost always there are newer nonstandard modules with superior functionality outside of the standard library.

answered Sep 30 '22 18:09

Peter Lyons

I find xml.etree essentially sufficient for everything, except for BeautifulSoup if I ever need to parse broken XML (not a common problem, differently from broken HTML, which BeautifulSoup also helps with and is everywhere): it has reasonable support for reading entire XML docs in memory, navigating them, creating them, incrementally-parsing large docs. lxml supports the same interface, and is generally faster -- useful to push performance when you can afford to install third party Python extensions (e.g. on App Engine you can't -- but xml.etree is still there, so you can run exactly the same code). lxml also has more features, and offers BeautifulSoup too.

The other libs you mention mimic APIs designed for very different languages, and in general I see no reason to contort Python into those gyrations. If you have very specific needs such as support for xslt, various kinds of validations, etc, it may be worth looking around for other libraries yet, but I haven't had such needs in a long time so I'm not current wrt the offerings for them.

answered Sep 30 '22 16:09

Alex Martelli

Related questions
                            
                                What is the easiest, most concise way to make selected attributes in an instance be readonly?
                            
                                Python - Py2exe can't build .exe using the 'email' module
                            
                                Any python libs for parsing Bind zone files?
                            
                                How to intercept special (alt / ctrl) key press?
                            
                                Python, SQLite and threading
                            
                                Test directory permissions in Python?
                            
                                pycurl: RETURNTRANSFER option doesn't exist
                            
                                Gather all Python modules used into one folder?
                            
                                Per-session transactions in Django
                            
                                Which path module or class do Python folks use instead of os.path?
                            
                                Invoking a method on an object
                            
                                Shall I bother with storing DateTime data as julianday in SQLite?
                            
                                Call Ruby or Python API in C# .NET
                            
                                Lxml html xpath context
                            
                                Django app initalization code (like connecting to signals)
                            
                                Can I unit test an inner function in python?
                            
                                Using Django, why would REMOTE_ADDR return 127.0.0.1 on a web server?
                            
                                Does the Python "open" function save its content in memory or in a temp file?
                            
                                Convert http headers (string) to a python dictionary
                            
                                Python 3 object construction: which is the most Pythonic / the accepted way?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which XML library for what purposes?

Tags:

python

xml

John Mee

People also ask

2 Answers

Peter Lyons

Alex Martelli

Recent Activity

Donate For Us