Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping unicode strings for MySQL in Python (avoiding exceptions.UnicodeEncodeError)

I am using Twisted to asynchronously access our database in Python. My code looks like this:

from twisted.enterprise import adbapi
from MySQLdb import _mysql as mysql

...

txn.execute("""
    INSERT INTO users_accounts_data_snapshots (accountid, programid, fieldid, value, timestamp, jobid)
    VALUES ('%s', '%s', '%s', '%s', '%s', '%s')
""" % (accountid, programid, record, mysql.escape_string(newrecordslist[record]), ended, jobid))

This worked until I came across this character: ®, which caused the thread to throw an exception: `exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 7: ordinal not in range(128)

However, if I don't use MySQLdb_mysql.escape_string(), I get database errors when input contains quotes etc (of course). The exception is occurring before the database is accessed so the collation of the database doesn't seem to matter at all.

What's the best way to escape this content without throwing exceptions on unicode characters? The ideal solution is one where I can pass unicode characters that won't interfere with the query along to MySQL unmolested; however, stripping the string of unicode characters, replacing them with question marks, mangling them or anything else that will stop the crashes would be acceptable.

like image 459
Andrew Gorcester Avatar asked Dec 12 '22 18:12

Andrew Gorcester


2 Answers

Do not format strings like this. It is a massive security hole. It is not possible to do the quoting correctly by yourself. Do not try.

Use the second parameter to 'execute'. Simply put, instead of txn.execute("... %s, %s ..." % ("xxx", "yyy")), do txn.execute("... %s, %s ...", ("xxx", "yyy")). Notice the comma instead of the percent sign. In other databases or with a different database binding, you might use a different character instead of "%s", like ? or :1, :2, :3 or :foo:, :bar:, :baz: but the idea is the same. (You can see the documentation for paramstyle in the DB-API 2.0 documentation if you are curious about alternatives.)

I've written about this in the past. The discussion on that post may be of particular interest to you.

Please also let me emphasize that this is the only correct way to do it. You may have seen MySQL documentation talking about quoting strings in various ways. You may have written applications in PHP which lacks a proper facility for passing database parameters. I guarantee that all of these sources of information are incorrect and lead to serious and ongoing security problems: do not interpolate parameters into your SQL strings.

like image 146
Glyph Avatar answered Jan 17 '23 17:01

Glyph


You can try:

newrecordslist[record].decode("utf-8")

Glyph is right about http://www.python.org/dev/peps/pep-0249/.

like image 36
baklarz2048 Avatar answered Jan 17 '23 18:01

baklarz2048