Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing XMLNS attribute with Python Elementree?

Tags:

How can one access NS attributes through using ElementTree?

With the following:

<data xmlns="http://www.foo.net/a" xmlns:a="http://www.foo.net/a" book="1" category="ABS" date="2009-12-22"> 

When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..

like image 355
Melchior Avatar asked Dec 23 '09 16:12

Melchior


People also ask

How do I read an XML string in Python?

There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.

How do you read a specific tag in an XML file in Python?

Example Read XML File in Python To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree. parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() .


2 Answers

I think element.tag is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.

>>> from xml.etree import ElementTree as ET >>> data = '''<data xmlns="http://www.foo.net/a" ...                 xmlns:a="http://www.foo.net/a" ...                 book="1" category="ABS" date="2009-12-22"/>''' >>> element = ET.fromstring(data) >>> element <Element {http://www.foo.net/a}data at 1013b74d0> >>> element.tag '{http://www.foo.net/a}data' >>> element.attrib {'category': 'ABS', 'date': '2009-12-22', 'book': '1'} 

If you just want to know the xmlns URI, you can split it out with a function like:

def tag_uri_and_name(elem):     if elem.tag[0] == "{":         uri, ignore, tag = elem.tag[1:].partition("}")     else:         uri = None         tag = elem.tag     return uri, tag 

For much more on namespaces and qualified names in ElementTree, see effbot's examples.

like image 85
Jeffrey Harris Avatar answered Jan 03 '23 00:01

Jeffrey Harris


Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.

However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.

Here's what I came up with:

import elementtree.ElementTree as ET  def parse_and_get_ns(file):     events = "start", "start-ns"     root = None     ns = {}     for event, elem in ET.iterparse(file, events):         if event == "start-ns":             if elem[0] in ns and ns[elem[0]] != elem[1]:                 # NOTE: It is perfectly valid to have the same prefix refer                 #     to different URI namespaces in different parts of the                 #     document. This exception serves as a reminder that this                 #     solution is not robust.    Use at your own peril.                 raise KeyError("Duplicate prefix with different URI found.")             ns[elem[0]] = "{%s}" % elem[1]         elif event == "start":             if root is None:                 root = elem     return ET.ElementTree(root), ns 

With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):

<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/"\ > <feed>   <item>     <title>Foo</title>     <dc:creator>Joe McGroin</dc:creator>     <description>etc...</description>   </item> </feed> </rss> 

You will be able to use the xml namepaces and get info for elements like dc:creator:

>>> tree, ns = parse_and_get_ns("my.xml") >>> ns {u'content': '{http://purl.org/rss/1.0/modules/content/}', u'dc': '{http://purl.org/dc/elements/1.1/}'} >>> item = tree.find("/feed/item") >>> item.findtext(ns['dc']+"creator") 'Joe McGroin' 
like image 32
deancutlet Avatar answered Jan 02 '23 23:01

deancutlet