Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python lxml.etree - Is it more effective to parse XML from string or directly from link?

With the lxml.etree python framework, is it more efficient to parse xml directly from a link to an online xml file or is it better to say, use a different framework (such as urllib2), to return a string and then parse from that? Or does it make no difference at all?

Method 1 - Parse directly from link

from lxml import etree as ET

parsed = ET.parse(url_link)

Method 2 - Parse from string

from lxml import etree as ET
import urllib2

xml_string = urllib2.urlopen(url_link).read()
parsed = ET.parse.fromstring(xml_string)

# note: I do not have access to python 
# at the moment, so not sure whether 
# the .fromstring() function is correct

Or is there a more efficient method than either of these, e.g. save the xml to a .xml file on desktop then parse from those?

like image 796
Isaac Avatar asked Apr 01 '14 18:04

Isaac


1 Answers

I ran the two methods with a simple timing rapper.

Method 1 - Parse XML Directly From Link

from lxml import etree as ET

@timing
def parseXMLFromLink():
    parsed = ET.parse(url_link)
    print parsed.getroot()

for n in range(0,100):
    parseXMLFromLink()

Average of 100 = 98.4035 ms

Method 2 - Parse XML From String Returned By Urllib2

from lxml import etree as ET
import urllib2

@timing
def parseXMLFromString():
    xml_string = urllib2.urlopen(url_link).read()
    parsed = ET.fromstring(xml_string)
    print parsed

for n in range(0,100):
    parseXMLFromString()

Average of 100 = 286.9630 ms

So anecdotally it seems that using lxml to parse directly from the link is the more immediately quick method. It's not clear whether it would be faster to download then parse large xml documents from the hard drive, but presumably unless the document is huge and the parsing task more intensive, the parseXMLFromLink() function would still remain quicker as it is urllib2 that seems to slow the second function down.

I ran this a few times and the results stayed the same.

like image 151
Isaac Avatar answered Sep 21 '22 20:09

Isaac