Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve all child nodes in a single query using lxml & XPATH

This is my xml data

<location>
   <city>
      <name> New York</name>
      <type>non-capital</type>
   </city>

   <city>
        <name> London</name>
        <type>capital</type>
   </city>
</location>

using lxml & python

from lxml import etree as ET

parser = ET.XMLParser(recover=True)

tree = ET.fromstring(xml_data,parser)
print(tree.xpath('//city//name/text() | //city//type/text()'))

The above code works but i'd like an nested-array description as [['New York','non-capital'],['London','capital']]

What would be the accurate xpath query/combination of queries/loops to get the above?

like image 617
wolfgang Avatar asked Mar 27 '15 05:03

wolfgang


People also ask

How do I select all child elements in XPath?

For the div element with an id attribute of hero //div[@id='hero'] , these XPath expression will select elements as follows: //div[@id='hero']/* will select all of its children elements. //div[@id='hero']/img will select all of its children img elements. //div[@id='hero']//* will select all of its descendent elements.

What is etree in lxml?

Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.


2 Answers

This is one possible way :

.......
result = []
for city in tree.xpath('//city'):
    result.append([city.find('name').text, city.find('type').text])

print(result)
# output :
#[[' New York', 'non-capital'], [' London', 'capital']]
like image 110
har07 Avatar answered Oct 03 '22 00:10

har07


List comprehension solution:

xml_data='''<location>
   <city>
      <name> New York</name>
      <type>non-capital</type>
   </city>
   <city>
        <name> London</name>
        <type>capital</type>
   </city>
</location>'''

from lxml import etree as ET

parser = ET.XMLParser(recover=True)

tree = ET.fromstring(xml_data,parser)
print(tree.xpath('//city'))


cities = [[c.text for c in n if c.tail] for n in tree.xpath('//city')]

Results in:

[[' New York', 'non-capital'], [' London', 'capital']]
like image 33
Marcin Avatar answered Oct 03 '22 00:10

Marcin