Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python sax error "junk after document element"

Tags:

python

sax

I use python sax to parse xml file. The xml file is actually a combination of multiple xml files. It looks like as follows:

<row name="abc" age="40" body="blalalala..." creationdate="03/10/10" />
<row name="bcd" age="50" body="blalalala..." creationdate="03/10/09" />

My python code is in the following. It show "junk after document element" error. Any good idea to solve this problem. Thanks.

from xml.sax.handler import ContentHandler
from xml.sax import make_parser,SAXException
import sys

class PostHandler (ContentHandler):
    def __init__(self):
        self.find = 0
        self.buffer = ''
        self.mapping={}
    def startElement(self,name,attrs):
        if name == 'row':
             self.find = 1
             self.body = attrs["body"]
             print attrs["body"]
    def character(self,data):
        if self.find==1:
             self.buffer+=data
    def endElement(self,name):
        if self.find == 1:
             self.mapping[self.body] = self.buffer
             print self.mapping
parser = make_parser()
handler = PostHandler()
parser.setContentHandler(handler)
try:
    parser.parse(open("2.xml"))
except SAXException:
like image 644
chnet Avatar asked Apr 04 '10 15:04

chnet


2 Answers

xmldata = '''
<row name="abc" age="40" body="blalalala..." creationdate="03/10/10" />
<row name="bcd" age="50" body="blalalala..." creationdate="03/10/09" />
'''

Add a wrapper tag around the data. I've used ElementTree since it's so simpler, but you'd be able to do the same on any parser:

from xml.etree import ElementTree as etree

# wrap the data
xmldata = '<rows>' +data + '</rows>'

rows = etree.fromstring(xmldata)
for row in rows:
    print row.attrib

Results in

{'age': '40',
 'body': 'blalalala...',
 'creationdate': '03/10/10',
 'name': 'abc'}
{'age': '50',
 'body': 'blalalala...',
 'creationdate': '03/10/09',
 'name': 'bcd'}
like image 191
nosklo Avatar answered Sep 16 '22 21:09

nosklo


Seems that you do not have root element in your XML file. Wrap your row elements into single rows element.

like image 25
Yaroslav Avatar answered Sep 19 '22 21:09

Yaroslav