Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting query results encoding in cx_Oracle / UnicodeDecodeError with Chinese characters

I'm working with a database containing a lot of Chinese characters. My code goes something like this:

connection = cx_Oracle.connect("%s/%s@%s:%s/%s" % (username, password, host, port, service_name))
cursor = connection.cursor()
cursor.execute('SELECT HOTEL_ID,CREATE_TIME,SOURCE,CONTENT,TITLE,RATE,UPDATE_TIME FROM T_FX_COMMENTS')

for row in cursor:
    # Stuff goes here
    pass

But I get this error:

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    for row in cursor:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 26: illegal multibyte sequence

It seems GBK is not enough. I want to make cx-oracle give me GB18030 encoded results, instead of GBK. How do I do this?

cx_Oracle.Connection.encoding is read-only... I haven't found anything in the cx-oracle documentation that suggests I can do this.

I'm on Python 3.3.2 and cx-oracle 5.1.2. There must be something I'm missing here. Help is appreciated!

like image 458
sorbet Avatar asked Aug 16 '13 06:08

sorbet


2 Answers

Try setting the NLS_LANG environment variable at the beginning of your program:

import os
os.environ["NLS_LANG"] = ".GB18030"
like image 33
Maciek Avatar answered Oct 02 '22 16:10

Maciek


I was facing the same issue and I solved by setting the environment variable NLS_LANG to .AL32UTF8 (it seems a sort of "wildcard" that says "use utf-8 for any language")

like image 73
daveoncode Avatar answered Oct 02 '22 16:10

daveoncode