Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.4 lxml.etree: Start tag expected, '<' not found, line 1, column 1

Tags:

python

xml

lxml

I want to take some simple xml files and convert them all to CSV in one go (though this code is just for one at a time). It looks to me like there are no official name spaces, but I'm not sure. I have this code (I used one header, SubmittingSystemVendor, but I really want to write all of them to CSV:

import csv
import lxml.etree
x = r'C:\Users\...\jh944.xml'

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow('SubmittingSystemVendor')
    root = lxml.etree.fromstring(x)

    writer.writerow(row)

Here is a sample of the XML file:

<?xml version="1.0" encoding="utf-8"?>
<EOYGeneralCollectionGroup SchemaVersionMajor="2014-2015" SchemaVersionMinor="1" CollectionId="157" SubmittingSystemName="MISTAR" SubmittingSystemVendor="WayneRESA" SubmittingSystemVersion="2014" xsi:noNamespaceSchemaLocation="http://cepi.state.mi.us/msdsxml/EOYGeneralCollection2014-20151.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <EOYGeneralCollection>
        <SubmittingEntity>
            <SubmittingEntityTypeCode>D</SubmittingEntityTypeCode>
            <SubmittingEntityCode>82730</SubmittingEntityCode>
        </SubmittingEntity>

The error is:

lxml.etree: Start tag expected, '<' not found, line 1, column 1

like image 795
Dance Party Avatar asked Sep 10 '15 19:09

Dance Party


1 Answers

You are using lxml.etree.fromstring, but giving it a file path as the argument. This means it's trying to interpret "C:\Users...\jh944.xml" as the XML data to be parsed.

Instead, you want to open the file containing this XML. You can simply replace the call to fromstring with lxml.etree.parse, which will accept a filename or open file object as the argument.

like image 176
Brian Avatar answered Oct 27 '22 00:10

Brian