I write a program using just like below
from xml.etree.ElementTree import ET
xmlroot = ET.fromstring([my xml content])
for element in xmlroot.iterfind(".//mytag"):
do some thing
it works fine on my python (v2.7.1), but after I copy it to another computer installed with python v2.6.x, iterfind()
is not supported, on python document, below description listed
findall(match)
Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.
iterfind(match)
Finds all matching subelements, by tag name or path. Returns an iterable yielding all matching elements in document order.
New in version 2.7.
my question is: these 2 function is same or not? what's difference between these two functions
The xml. etree. ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.
_setroot(element): For replacing the root of a tree we can use this _setroot object. So it will replace the current tree with the new element that we have given, and discard the existing content of that tree. getroot(): The getroot() will return the root element of the tree.
Like indicated in the docs -
findall returns the complete list of elements matching the match
xpath , we can use subscripts to access them , example -
>>> root = ET.fromstring("<a><b>c</b></a>")
>>> root.findall("./b")
[<Element 'b' at 0x02048C90>]
>>> lst = root.findall("./b")
>>> lst[0]
<Element 'b' at 0x02048C90>
We can also use for loop to iterate through the list.
iterfind would be faster than findall in cases where you actually want to iterate through the returned list(which is most of the time from my experience) , since findall has to create the complete list before returning, whereas iterfind finds (yields) the next element that matches the match
only on iterating and call to next(iter)
(which is what is internally called when iterating through the list using for
or such constructs).
In cases where you want the list, Both seem to have similar timing.
Performance test for both cases -
In [1]: import xml.etree.ElementTree as ET
In [2]: x = ET.fromstring('<a><b>c</b><b>d</b><b>e</b></a>')
In [3]: def foo(root):
...: d = root.findall('./b')
...: for y in d:
...: pass
...:
In [4]: def foo1(root):
...: d = root.iterfind('./b')
...: for y in d:
...: pass
...:
In [5]: %timeit foo(x)
100000 loops, best of 3: 9.24 µs per loop
In [6]: %timeit foo1(x)
100000 loops, best of 3: 6.65 µs per loop
In [7]: def foo2(root):
...: return root.findall('./b')
...:
In [8]: def foo3(root):
...: return list(root.iterfind('./b'))
...:
In [9]: %timeit foo2(x)
100000 loops, best of 3: 8.54 µs per loop
In [10]: %timeit foo3(x)
100000 loops, best of 3: 8.4 µs per loop
If you do
for element in xmlroot.iterfind(".//mytag"):
do some thing
then the elements will be retrieved from the XML file one at a time (one element per loop).
If you do
for element in xmlroot.findall(".//mytag"):
do some thing
all the elements will be retrieved at once and stored into a (temporary) list. Only then will the for
loop start to iterate over that list.
This means that the second method takes longer at the start (because it has to build that list) and uses more memory (same reason). Also, if you need to exit the for
loop before you've reached the last element, you will have done unnecessary work. On the other side, once you're inside the for
loop, the second method will probably be somewhat faster. Usually, the benefits of the first method ("lazy evaluation") outweigh this drawback.
In your case, it's probably safe to switch to findall
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With