lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.
There is a lot of documentation on the web and also in the Python standard library documentation, as lxml implements the well-known ElementTree API and tries to follow its documentation as closely as possible. The recipes in Fredrik Lundh's element library are generally worth taking a look at.
lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.
I'm trying to specify a namespace using lxml similar to this example (taken from here):
<TreeInventory xsi:noNamespaceSchemaLocation="Trees.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
</TreeInventory>
I'm not sure how to add the Schema instance to use and also the Schema location. The documentation got me started, by doing something like:
>>> NS = 'http://www.w3.org/2001/XMLSchema-instance'
>>> TREE = '{%s}' % NS
>>> NSMAP = {None: NS}
>>> tree = etree.Element(TREE + 'TreeInventory', nsmap=NSMAP)
>>> etree.tostring(tree, pretty_print=True)
'<TreeInventory xmlns="http://www.w3.org/2001/XMLSchema-instance"/>\n'
I'm not sure how to specify it an instance though, and then also specify a location. It seems like this can be done with the nsmap
keyword-arg in etree.Element
, but I don't see how.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With