Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

selecting attribute values from lxml

I want to use an xpath expression to get the value of an attribute.

I expected the following to work

from lxml import etree  for customer in etree.parse('file.xml').getroot().findall('BOB'):     print customer.find('./@NAME') 

but this gives an error :

Traceback (most recent call last):   File "bob.py", line 22, in <module>     print customer.find('./@ID')   File "lxml.etree.pyx", line 1409, in lxml.etree._Element.find (src/lxml/lxml.etree.c:39972)   File "/usr/local/lib/python2.7/dist-packages/lxml/_elementpath.py", line 272, in find     it = iterfind(elem, path, namespaces)   File "/usr/local/lib/python2.7/dist-packages/lxml/_elementpath.py", line 262, in iterfind     selector = _build_path_iterator(path, namespaces)   File "/usr/local/lib/python2.7/dist-packages/lxml/_elementpath.py", line 246, in _build_path_iterator     selector.append(ops[token[0]](_next, token)) KeyError: '@' 

Am I wrong to expect this to work?

like image 881
GHZ Avatar asked May 25 '11 15:05

GHZ


People also ask

What is Etree in lxml?

lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.

Is XML and lxml are same?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.

Can lxml parse HTML?

lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).


2 Answers

find and findall only implement a subset of XPath. Their presence is meant to provide compatibility with other ElementTree implementations (like ElementTree and cElementTree).

The xpath method, in contrast, provides full access to XPath 1.0:

print customer.xpath('./@NAME')[0] 

However, you could instead use get:

print customer.get('NAME') 

or attrib:

print customer.attrib['NAME'] 
like image 191
unutbu Avatar answered Sep 21 '22 17:09

unutbu


As a possible useful addition, this is how to get the value of an attribute in the case that the element has more than one, and it is the only difference with respect to another element. E.g., given the following file.xml:

<?xml version ="1.0" encoding="UTF-8"?>     <level1>       <level2 first_att='att1' second_att='foo'>8</level2>       <level2 first_att='att2' second_att='bar'>8</level2>     </level1> 

One can access the attribute 'bar' with:

import lxml.etree as etree tree = etree.parse("test_file.xml") print tree.xpath("//level1/level2[@first_att='att2']/@second_att")[0] 
like image 25
Use Me Avatar answered Sep 22 '22 17:09

Use Me