Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making lxml.objectify ignore xml namespaces?

So I gotta deal with some xml that looks like this:

<ns2:foobarResponse xmlns:ns2="http://api.example.com">
  <duration>206</duration>
  <artist>
    <tracks>...</tracks>
  </artist>
</ns2:foobarResponse>

I found lxml and it's objectify module, that lets you traverse a xml document in a pythonic way, like a dictionary.
Problem is: it's using the bogus xml namespace every time you try to access an element, like this:

from lxml import objectify

tree = objectify.fromstring(xml)
print tree.artist
# ERROR: no such child: {http://api.example.com}artist

It's trying to access <artist> with the parent namespace, but the tag doesn't use the ns.

Any ideas how to get around this? Thanks

like image 468
adamJLev Avatar asked Dec 28 '22 13:12

adamJLev


2 Answers

According to the lxml.objectify documentation, attribute lookups default to using the namespace of their parent element.

What you probably want to work would be:

print tree["{}artist"]

QName syntax like this would work if your children had a non-empty namespace ("{http://foo/}artist", for instance), but unfortunately, it looks like the current source code treats an empty namespace as no namespace, so all of objectify's lookup goodness will helpfully replace the empty namespace with the parent namespace, and you're out of luck.

This is either a bug ("{}artist" should work), or an enhancement request to file for the lxml folks.

For the moment, the best thing to do is probably:

print tree.xpath("artist")

It's unclear to me how much of a performance hit you'll take using xpath here, but this certainly works.

like image 148
Jeffrey Harris Avatar answered Dec 31 '22 03:12

Jeffrey Harris


Just for reference: Note that this works as expected since lxml 2.3.

From the lxml changelog:

" [...]

2.3 (2011-02-06) Features added

  • When looking for children, lxml.objectify takes '{}tag' as meaning an empty namespace, as opposed to the parent namespace.

[...]"

In action:

>>> xml = """<ns2:foobarResponse xmlns:ns2="http://api.example.com">
...   <duration>206</duration>
...   <artist>
...     <tracks>...</tracks>
...   </artist>
... </ns2:foobarResponse>"""
>>> tree = objectify.fromstring(xml)
>>> print tree['{}artist']
artist = None [ObjectifiedElement]
    tracks = '...' [StringElement]
>>>
like image 33
Holger Avatar answered Dec 31 '22 03:12

Holger