Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get an attribute of an Element that is namespaced

I'm parsing an XML document that I receive from a vendor everyday and it uses namespaces heavily. I've minimized the problem to a minimal subset here:

There are some elements I need to parse, all of which are children of an element with a specific attribute in it.
I am able to use lxml.etree.Element.findall(TAG, root.nsmap) to find the candidate nodes whose attribute I need to check.

I'm then trying to check the attribute of each of these Elements via the name I know it uses : which concretely here is ss:Name. If the value of that attribute is the desired value I'm going to dive deeper into said Element (to continue doing other things).

How can I do this?

The XML I'm parsing is roughly

<FOO xmlns="SOME_REALLY_LONG_STRING"
 some gorp declaring a bunch of namespaces one of which is 
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar" OTHER_ATTRIBS_I_DONT_CARE_ABOUT>
        ....
        <MoreThingsToLookAtLater>
            ....
        </MoreThingsToLookAtLater>
        ....
    </SomethingIWant>
    ...
</FOO>

I found the first Element I wanted SomethingIWant like so (ultimately I want them all so I did find all)

import lxml
from lxml import etree

tree = etree.parse(myfilename)
root = tree.getroot()
# i want just the first one for now
my_sheet = root.findall('ss:RecordSet', root.nsmap)[0]

Now I want to get the ss:Name attribute from this element, to check it, but I'm not sure how?

I know that my_sheet.attrib will display me the raw URI followed by the attribute name, but I don't want that. I need to check if it has a specific value for a specific namespaced attribute. (Because if it's wrong I can skip this element from further processing entirely).

I tried using lxml.etree.ElementTree.attrib.get() but I don't seem to obtain anything useful.

Any ideas?

like image 599
UpAndAdam Avatar asked Jun 26 '15 01:06

UpAndAdam


2 Answers

One of advantages of lxml over standard python XML parser is lxml's full-support of XPath 1.0 specfication via xpath() method. So I would go with xpath() method most of the time. Working example for your current case :

from lxml import etree

xml = """<FOO xmlns="SOME_REALLY_LONG_STRING"
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar">
        ....
    </SomethingIWant>
    ...
</FOO>"""

root = etree.fromstring(xml)
ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT'}

# i want just the first one for now
result = root.xpath('//@ss:Name', namespaces=ns)[0]
print(result)

output :

bar

UPDATE :

Modified example demonstrating how to get attribute in namespace from current element :

ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT', 'd': 'SOME_REALLY_LONG_STRING'}

element = root.xpath('//d:SomethingIWant', namespaces=ns)[0]
print(etree.tostring(element))

attribute = element.xpath('@ss:Name', namespaces=ns)[0]
print(attribute)

output :

<SomethingIWant xmlns="SOME_REALLY_LONG_STRING" xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT" ss:Name="bar">
        ....
    </SomethingIWant>
    ...

bar
like image 93
har07 Avatar answered Nov 17 '22 03:11

har07


I'm pretty sure this is a horribly NON-PYTHONIC non ideal way to do it; and it seems like there must be a better way... but I discovered I could do this:

SS_REAL = "{%s}" % root.nsmap.get('ss')

and then I could do: my_sheet.get( SS_REAL + "NAME" )

It gets me what I want.. but this can't possibly be the right way to do this..

like image 36
UpAndAdam Avatar answered Nov 17 '22 03:11

UpAndAdam