Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyODBC outputting improper UTF-16

I'm trying to pull out a list of table names from a MySQL database. The pertinent parts of the code are as follows:

conn = pyodbc.connect('...')
cursor = conn.cursor()
for table in cursor.tables():
    print table.table_name

For every table it prints a bunch of gibberish characters (boxes and question marks in diamonds). Using repr(table.table_name) it prints:

u'\U00500041\U004c0050\U00430049\U00540041\U004f0049'

for the table named "APPLICATION".

If you treat each 32 bit character as two 16 bit characters you get the string "PALPCITAOI". Swapping pairs of characters (1 and 2, 3 and 4, etc.) it produces "APPLICATIO" (missing the "N"). Presumably, it swapped N and the null character, resulting in the missing N (assuming null-terminated strings).

How can I get the table names out properly?

This is running under Python 2.4, PyODBC 2.1, and MySQL 5.0.22.

like image 891
Jonathan Avatar asked Mar 28 '12 16:03

Jonathan


2 Answers

Adding CHARSET=UTF8 to the connection string worked and I now have the application up and running. For example:

change

cnxn = pyodbc.connect('DSN=localhost')

to

cnxn = pyodbc.connect('DSN=localhost;CHARSET=UTF8')

Thank you Thomas for pointing me in the right direction. Upgrading to a later version of PyODBC may also work.

like image 174
Jonathan Avatar answered Oct 07 '22 04:10

Jonathan


This is a known issue that was fixed in pyodbc 2.1.8:

https://code.google.com/p/pyodbc/issues/detail?id=78&can=1&q=UTF-16

Unfortunately for Ubuntu LTS users the version available for Precise Pangolin is 2.1.7 where this is issue still shows itself. If you want to stick with LTS I've backported pyodbc 3.0.6 - you can find it here:

https://launchpad.net/~gazoakley/+archive/precise-backports

like image 44
Gareth Oakley Avatar answered Oct 07 '22 05:10

Gareth Oakley