Efficient way to iterate through xml elements

Tags:

I have a xml like this:

<a>     <b>hello</b>     <b>world</b> </a> <x>     <y></y> </x> <a>     <b>first</b>     <b>second</b>     <b>third</b> </a>

I need to iterate through all <a> and <b> tags, but I don't know how many of them are in document. So I use xpath to handle that:

from lxml import etree  doc = etree.fromstring(xml)  atags = doc.xpath('//a') for a in atags:     btags = a.xpath('b')     for b in btags:             print b

It works, but I have pretty big files, and cProfile shows me that xpath is very expensive to use.

I wonder, maybe there is there more efficient way to iterate through indefinitely number of xml-elements?

887

asked Jan 14 '11 20:01

nukl

2 Answers

XPath should be fast. You can reduce the number of XPath calls to one:

doc = etree.fromstring(xml) btags = doc.xpath('//a/b') for b in btags:     print b.text

If that is not fast enough, you could try Liza Daly's fast_iter. This has the advantage of not requiring that the entire XML be processed with etree.fromstring first, and parent nodes are thrown away after the children have been visited. Both of these things help reduce the memory requirements. Below is a modified version of fast_iter which is more aggressive about removing other elements that are no longer needed.

def fast_iter(context, func, *args, **kwargs):     """     fast_iter is useful if you need to free memory while iterating through a     very large XML file.      http://lxml.de/parsing.html#modifying-the-tree     Based on Liza Daly's fast_iter     http://www.ibm.com/developerworks/xml/library/x-hiperfparse/     See also http://effbot.org/zone/element-iterparse.htm     """     for event, elem in context:         func(elem, *args, **kwargs)         # It's safe to call clear() here because no descendants will be         # accessed         elem.clear()         # Also eliminate now-empty references from the root node to elem         for ancestor in elem.xpath('ancestor-or-self::*'):             while ancestor.getprevious() is not None:                 del ancestor.getparent()[0]     del context  def process_element(elt):     print(elt.text)  context=etree.iterparse(io.BytesIO(xml), events=('end',), tag='b') fast_iter(context, process_element)

Liza Daly's article on parsing large XML files may prove useful reading to you too. According to the article, lxml with fast_iter can be faster than cElementTree's iterparse. (See Table 1).

199

answered Oct 05 '22 11:10

unutbu

How about iter?

>>> for tags in root.iter('b'):         # root is the ElementTree object ...     print tags.tag, tags.text ...  b hello b world b first b second b third

answered Oct 05 '22 11:10

user225312

Related questions
                            
                                Android OpenGL ES glDrawArrays or glDrawElements?
                            
                                Why is filter in front of foldLeft slow in Scala?
                            
                                How to make a Mac OSX Cocoa application fullscreen?
                            
                                How to exclude xml doc files from msbuild
                            
                                Matplotlib, alternatives to savefig() to improve performance when saving into a CString object?
                            
                                How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?
                            
                                How can I know the values in CABasicAnimation keyPath
                            
                                Find js function with firebug
                            
                                Javadoc - how to copy function description?
                            
                                How to get file extension from content type?
                            
                                Multi-level page tables - hierarchical paging
                            
                                How can I develop an iPhone app in HTML5?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With