Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Suppress namespace in ElementTree

Given an xml file that looks like this:

<?xml version="1.0" encoding="windows-1252"?>
<Message xmlns="http://example.com/ns" xmlns:myns="urn:us:gov:dot:faa:aim:saa">
  <foo id="stuffid"/>
  <myns:bar/>
</Message>

When I parse it with ElementTree, the element tags look like:

{http://example.com/ns}Message
  {http://example.com/ns}foo
  {urn:us:gov:dot:faa:aim:saa}bar

But I'd rather just have

Message
  foo
  bar

and more importantly, I'd rather just pass "Message", "foo", and "bar" into the find() and findall() methods.

I've tried using substitutions to censor all xmlns: attributes as suggested in https://stackoverflow.com/a/15641319/338479 (and this is probably what I'll have to do if I can't find something more elegant), and I've tried calling ElementTree.register_namespace('', "http://example.com/ns") but that seems to only help with ElementTree.tostring(), which isn't what I wanted.

Isn't there just some way to get ElementTree to pretend it never heard of xmlns?

Let's assume that my element tags are globally unique even without the namespace qualifiers. In this case, the namespaces just get in the way.


Addressing some of the comments in detail:

Joe linked to Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall" which is close enough to my question that I guess mine is a duplicate. However, that question was not answered either. The suggestions given there were:

  • Use tree.findall("xmlns:DEAL_LEVEL/xmlns:PAID_OFF", namespaces={'xmlns': 'http://www.test.com'}).
    • I couldn't find the documentation for that call with those arguments in https://docs.python.org/2/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.findall, and at any rate it requires that I know all of the namespaces.
  • Pre-process the input XML and strip the xmlns attributes from the input as mentioned above.
  • Post-process the parsed document and strip all the namespaces from the tags.
    • Frankly, I like this approach the best. I will post the code as an answer.
  • Use register_namespace("", "http://example.com/ns")
    • This suppresses the namespace when using ElementTree.tostring(el) but not in el.tag. I expect it doesn't help find() or findall() either.
    • Again, this doesn't solve the problem where I need to know all the namespaces in advance (or extract them from the document somehow).
like image 797
Edward Falk Avatar asked Sep 13 '15 05:09

Edward Falk


1 Answers

OK, thanks for the links to the other question. I've decided to borrow (and improve on) one of the solutions given there:

def stripNs(el):
  '''Recursively search this element tree, removing namespaces.'''
  if el.tag.startswith("{"):
    el.tag = el.tag.split('}', 1)[1]  # strip namespace
  for k in el.attrib.keys():
    if k.startswith("{"):
      k2 = k.split('}', 1)[1]
      el.attrib[k2] = el.attrib[k]
      del el.attrib[k]
  for child in el:
    stripNs(child)
like image 55
Edward Falk Avatar answered Oct 19 '22 07:10

Edward Falk