I'm trying to pull out a list of table names from a MySQL database. The pertinent parts of the code are as follows:
conn = pyodbc.connect('...')
cursor = conn.cursor()
for table in cursor.tables():
print table.table_name
For every table it prints a bunch of gibberish characters (boxes and question marks in diamonds). Using repr(table.table_name)
it prints:
u'\U00500041\U004c0050\U00430049\U00540041\U004f0049'
for the table named "APPLICATION".
If you treat each 32 bit character as two 16 bit characters you get the string "PALPCITAOI". Swapping pairs of characters (1 and 2, 3 and 4, etc.) it produces "APPLICATIO" (missing the "N"). Presumably, it swapped N and the null character, resulting in the missing N (assuming null-terminated strings).
How can I get the table names out properly?
This is running under Python 2.4, PyODBC 2.1, and MySQL 5.0.22.
Adding CHARSET=UTF8 to the connection string worked and I now have the application up and running. For example:
change
cnxn = pyodbc.connect('DSN=localhost')
to
cnxn = pyodbc.connect('DSN=localhost;CHARSET=UTF8')
Thank you Thomas for pointing me in the right direction. Upgrading to a later version of PyODBC may also work.
This is a known issue that was fixed in pyodbc 2.1.8:
https://code.google.com/p/pyodbc/issues/detail?id=78&can=1&q=UTF-16
Unfortunately for Ubuntu LTS users the version available for Precise Pangolin is 2.1.7 where this is issue still shows itself. If you want to stick with LTS I've backported pyodbc 3.0.6 - you can find it here:
https://launchpad.net/~gazoakley/+archive/precise-backports
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With