So I'm looping through multiple large xml files and generating MySQL insert statements to add rental property listings to a database. Problem is, a number of elements contain special characters like Å or ç or even some dashes and bullets.
I can get the elements just fine, and I can make a string to hold the insert statement, but as soon as I try to execute the statement I get dumped out to the next file.
I've got the insert in its own try block, thinking that would just allow me to move on to the next listing rather than scrap the remainder of the xml document, but that's not happening.
I've tried making sure the insert is utf-8 encoded, but it's not making a difference.
Here is the gist of the code I've got:
try:
print "About to read file: "+fullpath
data = f.read() #read the file into a string
print "Data read from file, now closing: "+fullpath
f.close() #close the file, we don't need it any more
dom = minidom.parseString(data) #parse the xml
#get the first child node -- <property_data>
property_data = dom.firstChild
properties = property_data.getElementsByTagName('property')
for property in properties:
try:
print "getting details"
details = property.getElementsByTagName('property_details')
for detail in details:
print "attempting to get detail values"
try:
checkin = getElementValue('check_in', detail)
name = stripCDATA(getElementValue('name', detail))
checkout = getElementValue('check_out', detail)
...etc, etc...
print "building insert string"
sql = u"""insert into PROPERTY(NAME, CHECKIN, CHECKOUT, etc...)
values(%s,%s,%s,...)""".encode('utf-8')
print "starting insert with query:"
print sql % (name,checkin,checkout, etc...)
try: #HERE IS WHERE THE PROBLEM HAPPENS
cursor.execute(sql,(name, checkin, checkout, ...))
#display number of rows affected
print "Number of rows inserted: %d" % cursor.rowcount
conn.commit()
except Exception as (errno, strerror):
print "Problem inserting the property. Error({0}): {1}".format(errno, strerror)
except Exception as (errno, strerror):
print "Problem with reading/inserting details. Error({0}): {1}".format(errno, strerror)
except Exception as (errno, strerror):
print "The loop broke with the following error({0}): {1}".format(errno, strerror)
errCount += 1
print "This has happened %d times" % (errCount)
except: #HERE IS WHERE I GET DUMPED TO
print "Something bad happened while reading and inserting"
As you can see, I've got lines printing at various points so I can see when things blow up.
I know it's parsing the file correctly, I know it's getting all my elements correctly, I know it's building the insert statement correctly, and as long as I hit a property with no special characters anywhere in any of the elements I grab, I know it's inserting into the database correctly. It's just as soon as it hits a special character that it breaks, and when it breaks it dumps me out 3 levels higher than it should. Yelling and pulling my hair out have been ineffective so far.
Any ideas?
As per the suggestion from @deadly I removed all the try...except blocks and got the following traceback:
Traceback (most recent call last):
File "dbinsert2.py", line 118, in cursor.execute(sql,([bunch of var names]))
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 159, in execute query = query % db.literal(args)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 264, in literal return self.escape(o, self.encoders)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 202, in unicode_literal return db.literal(u.encode(unicode_literal.charset))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 20: ordinal not in range(256)
Not many people are going to have the patience to work through all that code.
Start by getting rid of every try...except. Python will still happily raise exceptions without it.
You only need to use try...except if you want to do some special handling on an exception beyond bug finding. Print statements are better friends at this stage. Also, if you leave off the try...excepts (at least the way you're using them), Python will also print the traceback which is what you should also post along with your code.
Please post this traceback once you've tidied up your code.
EDIT: Thanks for the traceback. Now we can see that there's a mismatch between the encoding you're using (utf-8) and the default encoding that the MySQLdb Python library uses (latin-1). You need to pass charset='utf8' as a parameter to connect(). (No dash in 'utf8' as this is how MySQL stores its list of character sets.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With