Python lxml.etree - Is it more effective to parse XML from string or directly from link?

Question

With the lxml.etree python framework, is it more efficient to parse xml directly from a link to an online xml file or is it better to say, use a different framework (such as urllib2), to return a string and then parse from that? Or does it make no difference at all?

Method 1 - Parse directly from link

from lxml import etree as ET

parsed = ET.parse(url_link)

Method 2 - Parse from string

from lxml import etree as ET
import urllib2

xml_string = urllib2.urlopen(url_link).read()
parsed = ET.parse.fromstring(xml_string)

# note: I do not have access to python 
# at the moment, so not sure whether 
# the .fromstring() function is correct

Or is there a more efficient method than either of these, e.g. save the xml to a .xml file on desktop then parse from those?

Isaac · Accepted Answer

I ran the two methods with a simple timing rapper.

Method 1 - Parse XML Directly From Link

from lxml import etree as ET

@timing
def parseXMLFromLink():
    parsed = ET.parse(url_link)
    print parsed.getroot()

for n in range(0,100):
    parseXMLFromLink()

Average of 100 = 98.4035 ms

Method 2 - Parse XML From String Returned By Urllib2

from lxml import etree as ET
import urllib2

@timing
def parseXMLFromString():
    xml_string = urllib2.urlopen(url_link).read()
    parsed = ET.fromstring(xml_string)
    print parsed

for n in range(0,100):
    parseXMLFromString()

Average of 100 = 286.9630 ms

So anecdotally it seems that using lxml to parse directly from the link is the more immediately quick method. It's not clear whether it would be faster to download then parse large xml documents from the hard drive, but presumably unless the document is huge and the parsing task more intensive, the parseXMLFromLink() function would still remain quicker as it is urllib2 that seems to slow the second function down.

I ran this a few times and the results stayed the same.

Python lxml.etree - Is it more effective to parse XML from string or directly from link?

Tags:

python

parsing

xml

urllib2

lxml

Isaac

1 Answers

Isaac

Recent Activity

Donate For Us

Python lxml.etree - Is it more effective to parse XML from string or directly from link?

Tags:

python

parsing

xml

urllib2

lxml

Isaac

1 Answers

Isaac

Related questions

Recent Activity

Donate For Us