Unable to find element by attribute with lxml

Question

I'm using a European Space Agency API to query (result can be viewed here) for satellite image metadata to parse into python objects.

Using the requests library I can successfully get the result in XML format and then read the content with lxml. I am able to find the elements and explore the tree as expected:

# loading the response into an ElementTree
tree = etree.fromstring(response.content)
root = tree.getroot()
ns = root.nsmap

# get the first entry element and its summary
e = root.find('entry',ns)
summary = e.find('summary',ns).text

print summary

>> 'Date: 2018-11-28T09:10:56.879Z, Instrument: OLCI, Mode: , Satellite: Sentinel-3, Size: 713.99 MB'

The entry element has several date descendants with different values of the attriubute name:

for d in e.findall('date',ns):
    print d.tag, d.attrib

>> {http://www.w3.org/2005/Atom}date {'name': 'creationdate'} {http://www.w3.org/2005/Atom}date {'name': 'beginposition'} {http://www.w3.org/2005/Atom}date {'name': 'endposition'} {http://www.w3.org/2005/Atom}date {'name': 'ingestiondate'}

I want to grab the beginposition date element using XPath syntax [@attrib='value'] but it just returns None. Even just searching for a date element with the name attribute ([@attrib]) returns None:

dt_begin = e.find('date[@name="beginposition"]',ns) # dt_begin is None
dt_begin = e.find('date[@name]',ns)                 # dt_begin is None

The entry element includes other children that exhibit the same behaviour e.g. multiple str elements also with differing name attributes.

Has anyone encountered anything similar or is there something I'm missing? I'm using Python 2.7.14 with lxml 4.2.4

mzjn · Accepted Answer

It looks like an explicit prefix is needed when a predicate ([@name="beginposition"]) is used. Here is a test program:

from lxml import etree

print etree.LXML_VERSION

tree = etree.parse("data.xml")  

ns1 = tree.getroot().nsmap
print ns1
print tree.find('entry', ns1)
print tree.find('entry/date', ns1)
print tree.find('entry/date[@name="beginposition"]', ns1)

ns2 = {"atom": 'http://www.w3.org/2005/Atom'}
print tree.find('atom:entry', ns2)
print tree.find('atom:entry/atom:date', ns2)
print tree.find('atom:entry/atom:date[@name="beginposition"]', ns2)

Output:

(4, 2, 5, 0)
{None: 'http://www.w3.org/2005/Atom', 'opensearch': 'http://a9.com/-/spec/opensearch/1.1/'}
<Element {http://www.w3.org/2005/Atom}entry at 0x7f8987750b90>
<Element {http://www.w3.org/2005/Atom}date at 0x7f89877503f8>
None
<Element {http://www.w3.org/2005/Atom}entry at 0x7f8987750098>
<Element {http://www.w3.org/2005/Atom}date at 0x7f898774a950>
<Element {http://www.w3.org/2005/Atom}date at 0x7f898774a7a0>

Unable to find element by attribute with lxml

Tags:

xml-attribute

xpath

lxml

Ali

1 Answers

mzjn

Recent Activity

Donate For Us

Unable to find element by attribute with lxml

Tags:

xml-attribute

xpath

lxml

Ali

1 Answers

mzjn

Related questions

Recent Activity

Donate For Us