Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write namespaced element attributes with LXML?

I'm using lxml (2.2.8) to create and write out some XML (specifically XGMML). The app which will be reading it is apparently fairly fussy and wants to see a top level element with:

<graph label="Test" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="h
ttp://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-
ns#" xmlns:cy="http://www.cytoscape.org" xmlns="http://www.cs.rpi.edu/XGMML"  di
rected="1">

How do I setup those xmlns: attributes with lxml ? If I try the obvious

root.attrib['xmlns:dc']='http://purl.org/dc/elements/1.1/'
root.attrib['xmlns:xlink']='http://www.w3.org/1999/xlink'
root.attrib['xmlns:rdf']='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
root.attrib['xmlns:cy']='http://www.cytoscape.org'
root.attrib['xmlns']='http://www.cs.rpi.edu/XGMML'

lxml throws a ValueError: Invalid attribute name u'xmlns:dc'

I've used XML and lxml a fair amount in the past for simple stuff, but managed to avoid needing to know anything about namespaces so far.

like image 718
timday Avatar asked Oct 09 '11 10:10

timday


People also ask

What is Etree in lxml?

Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.

What is lxml HTML?

Introduction. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.

What is lxml library in Python?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.


2 Answers

Unlike ElementTree or other serializers that would allow this, lxml needs you to set up these namespaces beforehand:

NSMAP = {"dc" : 'http://purl.org/dc/elements/1.1',
         "xlink" : 'http://www.w3.org/1999/xlink'}

root = Element("graph", nsmap = NSMAP)

(and so on and so forth for the rest of the declarations)

And then you can use the namespaces using their proper declarations:

n = SubElement(root, "{http://purl.org/dc/elements/1.1}foo")

Of course this gets annoying to type, so it is generally beneficial to assign the paths to short constant names:

DCNS = "http://purl.org/dc/elements/1.1"

And then use that variable in both the NSMAP and the SubElement declarations:

n = SubElement(root, "{%s}foo" % (DCNS))
like image 178
Nick Bastin Avatar answered Oct 22 '22 07:10

Nick Bastin


Using ElementMaker:

import lxml.etree as ET
import lxml.builder as builder
E = builder.ElementMaker(namespace='http://www.cs.rpi.edu/XGMML',
                         nsmap={None: 'http://www.cs.rpi.edu/XGMML',
                         'dc': 'http://purl.org/dc/elements/1.1/',
                         'xlink': 'http://www.w3.org/1999/xlink',
                         'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
                         'cy': 'http://www.cytoscape.org', })
graph = E.graph(label="Test", directed="1")
print(ET.tostring(graph, pretty_print=True))

yields

<graph xmlns:cy="http://www.cytoscape.org" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.cs.rpi.edu/XGMML" directed="1" label="Test"/>
like image 28
unutbu Avatar answered Oct 22 '22 07:10

unutbu