This is my xml data
<location>
<city>
<name> New York</name>
<type>non-capital</type>
</city>
<city>
<name> London</name>
<type>capital</type>
</city>
</location>
using lxml & python
from lxml import etree as ET
parser = ET.XMLParser(recover=True)
tree = ET.fromstring(xml_data,parser)
print(tree.xpath('//city//name/text() | //city//type/text()'))
The above code works but i'd like an nested-array description as [['New York','non-capital'],['London','capital']]
What would be the accurate xpath query/combination of queries/loops to get the above?
For the div element with an id attribute of hero //div[@id='hero'] , these XPath expression will select elements as follows: //div[@id='hero']/* will select all of its children elements. //div[@id='hero']/img will select all of its children img elements. //div[@id='hero']//* will select all of its descendent elements.
Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.
This is one possible way :
.......
result = []
for city in tree.xpath('//city'):
result.append([city.find('name').text, city.find('type').text])
print(result)
# output :
#[[' New York', 'non-capital'], [' London', 'capital']]
List comprehension solution:
xml_data='''<location>
<city>
<name> New York</name>
<type>non-capital</type>
</city>
<city>
<name> London</name>
<type>capital</type>
</city>
</location>'''
from lxml import etree as ET
parser = ET.XMLParser(recover=True)
tree = ET.fromstring(xml_data,parser)
print(tree.xpath('//city'))
cities = [[c.text for c in n if c.tail] for n in tree.xpath('//city')]
Results in:
[[' New York', 'non-capital'], [' London', 'capital']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With