I can not get a valid string from an MSSQL server into python. I believe there is an encoding mismatch somewhere. I believe it is between the ODBC layer and python because I am able to get readable results in tsql and isql.
What character encoding does pyodbc expect? What do I need to change in the chain to get this to work?
Specific Example
Here is a simplified python script as an example:
#!/usr/bin/env python
import pyodbc
dsn = 'yourdb'
user = 'import'
password = 'get0lddata'
database = 'YourDb'
def get_cursor():
con_string = 'DSN=%s;UID=%s;PWD=%s;DATABASE=%s;' % (dsn, user, password, database)
conn = pyodbc.connect(con_string)
return conn.cursor()
if __name__ == '__main__':
c = get_cursor()
c.execute("select id, name from recipe where id = 4140567")
row = c.fetchone()
if row:
print row
The output of this script is:
(Decimal('4140567'), u'\U0072006f\U006e0061\U00650067')
Alternatively, if the last line of the script is changed to:
print "{0}, '{1}'".format(row.id, row.name)
Then the result is:
Traceback (most recent call last):
File "/home/mdenson/projects/test.py", line 20, in <module>
print "{0}, '{1}'".format(row.id, row.name)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
A transcript using tsql to execute the same query:
root@luke:~# tsql -S cmw -U import -P get0lddata
locale is "C"
locale charset is "ANSI_X3.4-1968"
using default charset "UTF-8"
1> select id, name from recipe where id = 4140567
2> go
id name
4140567 orange2
(1 row affected)
and also in isql:
root@luke:~# isql -v yourdb import get0lddata
SQL> select id, name from recipe where id = 4140567
+----------------------+--------------------------+
| id | name |
+----------------------+--------------------------+
| 4140567 | orange2 |
+----------------------+--------------------------+
SQLRowCount returns 1
1 rows fetched
So I have worked at this for the morning and looked high and low and haven't figured out what is amiss.
Details
Here are version details:
pyodbc 2.1.7-1 (from ubuntu package) & 3.0.7-beta06 (compiled from source)
Server is XP with SQL Server Express 2008 R2
Here are the contents of a few configuration files on the client.
/etc/freetds/freetds.conf
[global]
tds version = 8.0
text size = 64512
[cmw]
host = 192.168.90.104
port = 1433
tds version = 8.0
client charset = UTF-8
/etc/odbcinst.ini
[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPTimeout =
CPReuse =
FileUsage = 1
/etc/odbc.ini
[yourdb]
Driver = FreeTDS
Description = ODBC connection via FreeTDS
Trace = No
Servername = cmw
Database = YourDB
Charset = UTF-8
So after continued work I am now getting unicode characters into python. Unfortunately the solution I've stumbled upon is about as satisfying as kissing your cousin.
I solved the problem by installing the python3 and python3-dev packages and then rebuilding pyodbc with python3.
Now that I've done this my scripts now work even though I am still running them with python 2.7.
So I don't know what was fixed by doing this, but it now works and I can move on to the project I started with.
Any chance you're having a problem with a BOM (Byte Order Marker)? If so, maybe this snippet of code will help:
import codecs
if s.beginswith( codecs.BOM_UTF8 ):
# The byte string s begins with the BOM: Do something.
# For example, decode the string as UTF-8
if u[0] == unicode( codecs.BOM_UTF8, "utf8" ):
# The unicode string begins with the BOM: Do something.
# For example, remove the character.
# Strip the BOM from the beginning of the Unicode string, if it exists
u.lstrip( unicode( codecs.BOM_UTF8, "utf8" ) )
I found that snippet on this page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With