How can one access NS attributes through using ElementTree?
With the following:
<data xmlns="http://www.foo.net/a" xmlns:a="http://www.foo.net/a" book="1" category="ABS" date="2009-12-22">
When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..
There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.
Example Read XML File in Python To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree. parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() .
I think element.tag
is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.
>>> from xml.etree import ElementTree as ET >>> data = '''<data xmlns="http://www.foo.net/a" ... xmlns:a="http://www.foo.net/a" ... book="1" category="ABS" date="2009-12-22"/>''' >>> element = ET.fromstring(data) >>> element <Element {http://www.foo.net/a}data at 1013b74d0> >>> element.tag '{http://www.foo.net/a}data' >>> element.attrib {'category': 'ABS', 'date': '2009-12-22', 'book': '1'}
If you just want to know the xmlns URI, you can split it out with a function like:
def tag_uri_and_name(elem): if elem.tag[0] == "{": uri, ignore, tag = elem.tag[1:].partition("}") else: uri = None tag = elem.tag return uri, tag
For much more on namespaces and qualified names in ElementTree, see effbot's examples.
Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.
However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.
Here's what I came up with:
import elementtree.ElementTree as ET def parse_and_get_ns(file): events = "start", "start-ns" root = None ns = {} for event, elem in ET.iterparse(file, events): if event == "start-ns": if elem[0] in ns and ns[elem[0]] != elem[1]: # NOTE: It is perfectly valid to have the same prefix refer # to different URI namespaces in different parts of the # document. This exception serves as a reminder that this # solution is not robust. Use at your own peril. raise KeyError("Duplicate prefix with different URI found.") ns[elem[0]] = "{%s}" % elem[1] elif event == "start": if root is None: root = elem return ET.ElementTree(root), ns
With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):
<?xml version="1.0" encoding="UTF-8" ?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/"\ > <feed> <item> <title>Foo</title> <dc:creator>Joe McGroin</dc:creator> <description>etc...</description> </item> </feed> </rss>
You will be able to use the xml namepaces and get info for elements like dc:creator:
>>> tree, ns = parse_and_get_ns("my.xml") >>> ns {u'content': '{http://purl.org/rss/1.0/modules/content/}', u'dc': '{http://purl.org/dc/elements/1.1/}'} >>> item = tree.find("/feed/item") >>> item.findtext(ns['dc']+"creator") 'Joe McGroin'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With