Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse XML and count instances of a particular node attribute?

Tags:

python

xml

I have many rows in a database that contains XML and I'm trying to write a Python script to count instances of a particular node attribute.

My tree looks like:

<foo>    <bar>       <type foobar="1"/>       <type foobar="2"/>    </bar> </foo> 

How can I access the attributes "1" and "2" in the XML using Python?

like image 830
randombits Avatar asked Dec 16 '09 05:12

randombits


2 Answers

minidom is the quickest and pretty straight forward.

XML:

<data>     <items>         <item name="item1"></item>         <item name="item2"></item>         <item name="item3"></item>         <item name="item4"></item>     </items> </data> 

Python:

from xml.dom import minidom xmldoc = minidom.parse('items.xml') itemlist = xmldoc.getElementsByTagName('item') print(len(itemlist)) print(itemlist[0].attributes['name'].value) for s in itemlist:     print(s.attributes['name'].value) 

Output:

4 item1 item1 item2 item3 item4 
like image 20
Ryan Christensen Avatar answered Sep 23 '22 00:09

Ryan Christensen


I suggest ElementTree. There are other compatible implementations of the same API, such as lxml, and cElementTree in the Python standard library itself; but, in this context, what they chiefly add is even more speed -- the ease of programming part depends on the API, which ElementTree defines.

First build an Element instance root from the XML, e.g. with the XML function, or by parsing a file with something like:

import xml.etree.ElementTree as ET root = ET.parse('thefile.xml').getroot() 

Or any of the many other ways shown at ElementTree. Then do something like:

for type_tag in root.findall('bar/type'):     value = type_tag.get('foobar')     print(value) 

And similar, usually pretty simple, code patterns.

like image 123
Alex Martelli Avatar answered Sep 24 '22 00:09

Alex Martelli