Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I configure pyodbc to correctly accept strings from SQL Server using freeTDS and unixODBC?

I can not get a valid string from an MSSQL server into python. I believe there is an encoding mismatch somewhere. I believe it is between the ODBC layer and python because I am able to get readable results in tsql and isql.

What character encoding does pyodbc expect? What do I need to change in the chain to get this to work?

Specific Example

Here is a simplified python script as an example:

#!/usr/bin/env python
import pyodbc

dsn = 'yourdb'
user = 'import'
password = 'get0lddata'
database = 'YourDb'

def get_cursor():
    con_string = 'DSN=%s;UID=%s;PWD=%s;DATABASE=%s;' % (dsn, user, password, database)
    conn = pyodbc.connect(con_string)
    return conn.cursor()

if __name__ == '__main__':
    c = get_cursor()
    c.execute("select id, name from recipe where id = 4140567")

    row = c.fetchone()
    if row:
        print row

The output of this script is:

(Decimal('4140567'), u'\U0072006f\U006e0061\U00650067')

Alternatively, if the last line of the script is changed to:

print "{0}, '{1}'".format(row.id, row.name)

Then the result is:

Traceback (most recent call last):
  File "/home/mdenson/projects/test.py", line 20, in <module>
    print "{0}, '{1}'".format(row.id, row.name)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

A transcript using tsql to execute the same query:

root@luke:~# tsql -S cmw -U import -P get0lddata
locale is "C"
locale charset is "ANSI_X3.4-1968"
using default charset "UTF-8"
1> select id, name from recipe where id = 4140567
2> go
id      name
4140567 orange2
(1 row affected)

and also in isql:

root@luke:~# isql -v yourdb import get0lddata
SQL>  select id, name from recipe where id = 4140567
+----------------------+--------------------------+
| id                   | name                     |
+----------------------+--------------------------+
| 4140567              | orange2                  |
+----------------------+--------------------------+
SQLRowCount returns 1
1 rows fetched

So I have worked at this for the morning and looked high and low and haven't figured out what is amiss.

Details

Here are version details:

  • Client is Ubuntu 12.04
  • freetds v0.91
  • unixodbc 2.2.14
  • python 2.7.3
  • pyodbc 2.1.7-1 (from ubuntu package) & 3.0.7-beta06 (compiled from source)

  • Server is XP with SQL Server Express 2008 R2

Here are the contents of a few configuration files on the client.

/etc/freetds/freetds.conf

[global]
    tds version = 8.0
    text size = 64512
[cmw]
    host = 192.168.90.104
    port = 1433
    tds version = 8.0
    client charset = UTF-8

/etc/odbcinst.ini

[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPTimeout =
CPReuse =
FileUsage = 1

/etc/odbc.ini

[yourdb]
Driver = FreeTDS
Description = ODBC connection via FreeTDS
Trace = No
Servername = cmw
Database = YourDB
Charset = UTF-8
like image 797
MatthewD Avatar asked Dec 11 '12 21:12

MatthewD


2 Answers

So after continued work I am now getting unicode characters into python. Unfortunately the solution I've stumbled upon is about as satisfying as kissing your cousin.

I solved the problem by installing the python3 and python3-dev packages and then rebuilding pyodbc with python3.

Now that I've done this my scripts now work even though I am still running them with python 2.7.

So I don't know what was fixed by doing this, but it now works and I can move on to the project I started with.

like image 194
MatthewD Avatar answered Oct 24 '22 11:10

MatthewD


Any chance you're having a problem with a BOM (Byte Order Marker)? If so, maybe this snippet of code will help:

import codecs
if s.beginswith( codecs.BOM_UTF8 ):
    # The byte string s begins with the BOM: Do something.
    # For example, decode the string as UTF-8

if u[0] == unicode( codecs.BOM_UTF8, "utf8" ):
    # The unicode string begins with the BOM: Do something.
    # For example, remove the character.

# Strip the BOM from the beginning of the Unicode string, if it exists
u.lstrip( unicode( codecs.BOM_UTF8, "utf8" ) )

I found that snippet on this page.

like image 20
Benny Hill Avatar answered Oct 24 '22 10:10

Benny Hill