I'm trying to parse an XML document I retrieve from the web, but it crashes after parsing with this error:
': failed to load external entity "<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="GreenButtonDataStyleSheet.xslt"?>
That is the second line in the XML that is downloaded. Is there a way to prevent the parser from trying to load the external entity, or another way to solve this? This is the code I have so far:
import urllib2 import lxml.etree as etree file = urllib2.urlopen("http://www.greenbuttondata.org/data/15MinLP_15Days.xml") data = file.read() file.close() tree = etree.parse(data)
In concert with what mzjn said, if you do want to pass a string to etree.parse(), just wrap it in a StringIO object.
Example:
from lxml import etree from StringIO import StringIO myString = "<html><p>blah blah blah</p></html>" tree = etree.parse(StringIO(myString))
This method is used in the lxml documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With