I use python sax to parse xml file. The xml file is actually a combination of multiple xml files. It looks like as follows:
<row name="abc" age="40" body="blalalala..." creationdate="03/10/10" />
<row name="bcd" age="50" body="blalalala..." creationdate="03/10/09" />
My python code is in the following. It show "junk after document element" error. Any good idea to solve this problem. Thanks.
from xml.sax.handler import ContentHandler
from xml.sax import make_parser,SAXException
import sys
class PostHandler (ContentHandler):
def __init__(self):
self.find = 0
self.buffer = ''
self.mapping={}
def startElement(self,name,attrs):
if name == 'row':
self.find = 1
self.body = attrs["body"]
print attrs["body"]
def character(self,data):
if self.find==1:
self.buffer+=data
def endElement(self,name):
if self.find == 1:
self.mapping[self.body] = self.buffer
print self.mapping
parser = make_parser()
handler = PostHandler()
parser.setContentHandler(handler)
try:
parser.parse(open("2.xml"))
except SAXException:
xmldata = '''
<row name="abc" age="40" body="blalalala..." creationdate="03/10/10" />
<row name="bcd" age="50" body="blalalala..." creationdate="03/10/09" />
'''
Add a wrapper tag around the data. I've used ElementTree since it's so simpler, but you'd be able to do the same on any parser:
from xml.etree import ElementTree as etree
# wrap the data
xmldata = '<rows>' +data + '</rows>'
rows = etree.fromstring(xmldata)
for row in rows:
print row.attrib
Results in
{'age': '40',
'body': 'blalalala...',
'creationdate': '03/10/10',
'name': 'abc'}
{'age': '50',
'body': 'blalalala...',
'creationdate': '03/10/09',
'name': 'bcd'}
Seems that you do not have root element in your XML file. Wrap your row elements into single rows element.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With