What's difference between findall() and iterfind() of xml.etree.ElementTree

Tags:

I write a program using just like below

from xml.etree.ElementTree import ET

xmlroot = ET.fromstring([my xml content])

for element in xmlroot.iterfind(".//mytag"):
    do some thing

it works fine on my python (v2.7.1), but after I copy it to another computer installed with python v2.6.x, iterfind() is not supported, on python document, below description listed

findall(match)

Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.

iterfind(match)

Finds all matching subelements, by tag name or path. Returns an iterable yielding all matching elements in document order.

New in version 2.7.

my question is: these 2 function is same or not? what's difference between these two functions

260

asked Jun 25 '15 07:06

john zhao

2 Answers

Like indicated in the docs -

findall returns the complete list of elements matching the match xpath , we can use subscripts to access them , example -

>>> root = ET.fromstring("<a><b>c</b></a>")
>>> root.findall("./b")
[<Element 'b' at 0x02048C90>]
>>> lst = root.findall("./b")
>>> lst[0]
<Element 'b' at 0x02048C90>

We can also use for loop to iterate through the list.

iterfind returns an iterator (generator), it does not return the list , in this case we cannot use subscripts to access the element, we can only use it in places where iterators are accepted, an example would be in a for loop.

iterfind would be faster than findall in cases where you actually want to iterate through the returned list(which is most of the time from my experience) , since findall has to create the complete list before returning, whereas iterfind finds (yields) the next element that matches the match only on iterating and call to next(iter) (which is what is internally called when iterating through the list using for or such constructs).

In cases where you want the list, Both seem to have similar timing.

Performance test for both cases -

In [1]: import xml.etree.ElementTree as ET

In [2]: x = ET.fromstring('<a><b>c</b><b>d</b><b>e</b></a>')

In [3]: def foo(root):
   ...:     d = root.findall('./b')
   ...:     for  y in d:
   ...:         pass
   ...: 

In [4]: def foo1(root):
   ...:     d = root.iterfind('./b')
   ...:     for y in d:
   ...:         pass
   ...: 

In [5]: %timeit foo(x)
100000 loops, best of 3: 9.24 µs per loop

In [6]: %timeit foo1(x)
100000 loops, best of 3: 6.65 µs per loop

In [7]: def foo2(root):
   ...:     return root.findall('./b')
   ...: 

In [8]: def foo3(root):
   ...:     return list(root.iterfind('./b'))
   ...: 

In [9]: %timeit foo2(x)
100000 loops, best of 3: 8.54 µs per loop

In [10]: %timeit foo3(x)
100000 loops, best of 3: 8.4 µs per loop

125

answered Nov 14 '22 00:11

Anand S Kumar

If you do

for element in xmlroot.iterfind(".//mytag"):
    do some thing

then the elements will be retrieved from the XML file one at a time (one element per loop).

If you do

for element in xmlroot.findall(".//mytag"):
    do some thing

all the elements will be retrieved at once and stored into a (temporary) list. Only then will the for loop start to iterate over that list.

This means that the second method takes longer at the start (because it has to build that list) and uses more memory (same reason). Also, if you need to exit the for loop before you've reached the last element, you will have done unnecessary work. On the other side, once you're inside the for loop, the second method will probably be somewhat faster. Usually, the benefits of the first method ("lazy evaluation") outweigh this drawback.

In your case, it's probably safe to switch to findall.

answered Nov 14 '22 00:11

Tim Pietzcker

Related questions
                            
                                Log Normal Random Variables with Scipy
                            
                                Loading global data for server using Flask and gunicorn
                            
                                PyPI API - How to get stable package version
                            
                                How can I format a float with given precision and zero padding?
                            
                                how to Count the number of non zero pixels of the canny image in my python program
                            
                                Distinguish matches in pyparsing
                            
                                Apply function row wise on pandas data frame on columns with numerical values
                            
                                Exception gevent.hub.LoopExit: LoopExit('This operation would block forever',)
                            
                                Python for key, value in dictionary
                            
                                Same view with multiple URL patterns and optional arguments
                            
                                Combinatoric / cartesian product of Numpy arrays without iterators and/or loop(s) [duplicate]
                            
                                How to suppress the display of passwords?
                            
                                statsmodels summary to latex
                            
                                Extracting columns containing a certain name
                            
                                WeasyPrint: fixed footer tag overlapped by long table on each pdf page
                            
                                Python - store a string and an int using map(sys.stdin.readline())
                            
                                insert ignore pandas dataframe into mysql
                            
                                Extracting URL and anchor text from Markdown using Python
                            
                                SAWarning when querying with SQLAlchemy into pandas df
                            
                                Plotting the data with scrollable x (time/horizontal) axis on Linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's difference between findall() and iterfind() of xml.etree.ElementTree

Tags:

python

xml

elementtree

john zhao

People also ask

2 Answers

Anand S Kumar

Tim Pietzcker

Recent Activity

Donate For Us