I am using Python 2.7.3 on Mac OSX and have lxml version 3.3.3 installed. I have several xml files that are in the same directory, for instance, MyDir/file1.xml and MyDir/file2.xml. I am trying to bring each one into python and extract the relevant information. However, I can't seem to get the etree parser to work. My code is very simple:
from lxml import etree
from os import listdir
from os.path import isfile, join
xmlfiles = [x for x in listdir("MyDir") if isfile(join("MyDir",x))]
for file in xmlfiles:
doc = etree.parse(file)
get the stuff I need
However, the parser keeps throwing me the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
File "parser.pxi", line 1748, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102066)
File "parser.pxi", line 1774, in lxml.etree._parseDocumentFromURL
(src/lxml/lxml.etree.c:102330)
File "parser.pxi", line 1678, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:101365)
File "parser.pxi", line 1110, in lxml.etree._BaseParser._parseDocFromFile
(src/lxml/lxml.etree.c:96817)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc
(src/lxml/lxml.etree.c:91275)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91722)
IOError: Error reading file 'File1.xml': failed to load external entity
"File1.xml"
I've looked at several answers on here, but they are all for specific questions, mostly dealing with feeding the parser an html file whereas I'm just feeding it an xml file already stored on my local machine. Can anybody please help me figure out why this isn't working properly?
Also, is there a better way to parse and extract information from xml files using python then the approach I'm taking (assuming I get it to work!).
Thanks
I'd better use glob.iglob() with a *.xml file mask instead. This is more explicit and safe:
for filename in glob.iglob("MyDir/*.xml"):
tree = etree.parse(filename)
print tree.getroot()
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With