I'm using lxml (2.2.8) to create and write out some XML (specifically XGMML). The app which will be reading it is apparently fairly fussy and wants to see a top level element with:
<graph label="Test" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="h
ttp://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-
ns#" xmlns:cy="http://www.cytoscape.org" xmlns="http://www.cs.rpi.edu/XGMML" di
rected="1">
How do I setup those xmlns:
attributes with lxml ? If I try the obvious
root.attrib['xmlns:dc']='http://purl.org/dc/elements/1.1/'
root.attrib['xmlns:xlink']='http://www.w3.org/1999/xlink'
root.attrib['xmlns:rdf']='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
root.attrib['xmlns:cy']='http://www.cytoscape.org'
root.attrib['xmlns']='http://www.cs.rpi.edu/XGMML'
lxml throws a ValueError: Invalid attribute name u'xmlns:dc'
I've used XML and lxml a fair amount in the past for simple stuff, but managed to avoid needing to know anything about namespaces so far.
Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.
Introduction. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.
lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.
Unlike ElementTree or other serializers that would allow this, lxml
needs you to set up these namespaces beforehand:
NSMAP = {"dc" : 'http://purl.org/dc/elements/1.1',
"xlink" : 'http://www.w3.org/1999/xlink'}
root = Element("graph", nsmap = NSMAP)
(and so on and so forth for the rest of the declarations)
And then you can use the namespaces using their proper declarations:
n = SubElement(root, "{http://purl.org/dc/elements/1.1}foo")
Of course this gets annoying to type, so it is generally beneficial to assign the paths to short constant names:
DCNS = "http://purl.org/dc/elements/1.1"
And then use that variable in both the NSMAP
and the SubElement
declarations:
n = SubElement(root, "{%s}foo" % (DCNS))
Using ElementMaker:
import lxml.etree as ET
import lxml.builder as builder
E = builder.ElementMaker(namespace='http://www.cs.rpi.edu/XGMML',
nsmap={None: 'http://www.cs.rpi.edu/XGMML',
'dc': 'http://purl.org/dc/elements/1.1/',
'xlink': 'http://www.w3.org/1999/xlink',
'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'cy': 'http://www.cytoscape.org', })
graph = E.graph(label="Test", directed="1")
print(ET.tostring(graph, pretty_print=True))
yields
<graph xmlns:cy="http://www.cytoscape.org" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.cs.rpi.edu/XGMML" directed="1" label="Test"/>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With