Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to find element by attribute with lxml

I'm using a European Space Agency API to query (result can be viewed here) for satellite image metadata to parse into python objects.

Using the requests library I can successfully get the result in XML format and then read the content with lxml. I am able to find the elements and explore the tree as expected:

# loading the response into an ElementTree
tree = etree.fromstring(response.content)
root = tree.getroot()
ns = root.nsmap

# get the first entry element and its summary
e = root.find('entry',ns)
summary = e.find('summary',ns).text

print summary

>> 'Date: 2018-11-28T09:10:56.879Z, Instrument: OLCI, Mode: , Satellite: Sentinel-3, Size: 713.99 MB'

The entry element has several date descendants with different values of the attriubute name:

for d in e.findall('date',ns):
    print d.tag, d.attrib

>> {http://www.w3.org/2005/Atom}date {'name': 'creationdate'} {http://www.w3.org/2005/Atom}date {'name': 'beginposition'} {http://www.w3.org/2005/Atom}date {'name': 'endposition'} {http://www.w3.org/2005/Atom}date {'name': 'ingestiondate'}

I want to grab the beginposition date element using XPath syntax [@attrib='value'] but it just returns None. Even just searching for a date element with the name attribute ([@attrib]) returns None:

dt_begin = e.find('date[@name="beginposition"]',ns) # dt_begin is None
dt_begin = e.find('date[@name]',ns)                 # dt_begin is None

The entry element includes other children that exhibit the same behaviour e.g. multiple str elements also with differing name attributes.

Has anyone encountered anything similar or is there something I'm missing? I'm using Python 2.7.14 with lxml 4.2.4

like image 466
Ali Avatar asked Jan 01 '26 14:01

Ali


1 Answers

It looks like an explicit prefix is needed when a predicate ([@name="beginposition"]) is used. Here is a test program:

from lxml import etree

print etree.LXML_VERSION

tree = etree.parse("data.xml")  

ns1 = tree.getroot().nsmap
print ns1
print tree.find('entry', ns1)
print tree.find('entry/date', ns1)
print tree.find('entry/date[@name="beginposition"]', ns1)

ns2 = {"atom": 'http://www.w3.org/2005/Atom'}
print tree.find('atom:entry', ns2)
print tree.find('atom:entry/atom:date', ns2)
print tree.find('atom:entry/atom:date[@name="beginposition"]', ns2)

Output:

(4, 2, 5, 0)
{None: 'http://www.w3.org/2005/Atom', 'opensearch': 'http://a9.com/-/spec/opensearch/1.1/'}
<Element {http://www.w3.org/2005/Atom}entry at 0x7f8987750b90>
<Element {http://www.w3.org/2005/Atom}date at 0x7f89877503f8>
None
<Element {http://www.w3.org/2005/Atom}entry at 0x7f8987750098>
<Element {http://www.w3.org/2005/Atom}date at 0x7f898774a950>
<Element {http://www.w3.org/2005/Atom}date at 0x7f898774a7a0>
like image 63
mzjn Avatar answered Jan 06 '26 05:01

mzjn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!