Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble Using LXML ETREE to Parse XML Files on Local Machine With Python

Tags:

python

xml

lxml

I am using Python 2.7.3 on Mac OSX and have lxml version 3.3.3 installed. I have several xml files that are in the same directory, for instance, MyDir/file1.xml and MyDir/file2.xml. I am trying to bring each one into python and extract the relevant information. However, I can't seem to get the etree parser to work. My code is very simple:

 from lxml import etree
 from os import listdir
 from os.path import isfile, join

 xmlfiles = [x for x in listdir("MyDir") if isfile(join("MyDir",x))]

 for file in xmlfiles:

     doc = etree.parse(file)

         get the stuff I need

However, the parser keeps throwing me the following error

 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
   File "parser.pxi", line 1748, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102066)
   File "parser.pxi", line 1774, in lxml.etree._parseDocumentFromURL       
   (src/lxml/lxml.etree.c:102330)
   File "parser.pxi", line 1678, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:101365)
   File "parser.pxi", line 1110, in lxml.etree._BaseParser._parseDocFromFile 
   (src/lxml/lxml.etree.c:96817)
   File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc   
   (src/lxml/lxml.etree.c:91275)
   File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
   File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91722)
 IOError: Error reading file 'File1.xml': failed to load external entity     
 "File1.xml"

I've looked at several answers on here, but they are all for specific questions, mostly dealing with feeding the parser an html file whereas I'm just feeding it an xml file already stored on my local machine. Can anybody please help me figure out why this isn't working properly?

Also, is there a better way to parse and extract information from xml files using python then the approach I'm taking (assuming I get it to work!).

Thanks

like image 927
Mark Clements Avatar asked Jun 11 '26 11:06

Mark Clements


1 Answers

I'd better use glob.iglob() with a *.xml file mask instead. This is more explicit and safe:

for filename in glob.iglob("MyDir/*.xml"):
    tree = etree.parse(filename)
    print tree.getroot()

Hope that helps.

like image 97
alecxe Avatar answered Jun 14 '26 00:06

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!