I am having NVARCHAR
type column in my database. I am unable to convert the content of this column to plain string in my code. (I am using pyodbc
for the database connection).
# This unicode string is returned by the database
>>> my_string = u'\u4157\u4347\u6e65\u6574\u2d72\u3430\u3931\u3530\u3731\u3539\u3533\u3631\u3630\u3530\u3330\u322d\u3130\u3036\u3036\u3135\u3432\u3538\u2d37\u3134\u3039\u352d'
# prints something in chineese
>>> print my_string
䅗䍇湥整㐰㤱㔰㜱㔹㔳㘱㘰㔰㌰㈭〶〶ㄵ㐲㔸ⴷㄴ〹㔭
The closest I have gone is via encoding it to utf-16
as:
>>> my_string.encode('utf-16')
'\xff\xfeWAGCenter-04190517953516060503-20160605124857-4190-5'
>>> print my_string.encode('utf-16')
��WAGCenter-04190517953516060503-20160605124857-4190-5
But the actual value that I need as per the value store in database is:
WAGCenter-04190517953516060503-20160605124857-4190-51
I tried with encoding it to utf-8
, utf-16
, ascii
, utf-32
but nothing seemed to work.
Does anyone have the idea regarding what I am missing? And how to get the desired result from the my_string
.
Edit: On converting it to utf-16-le
, I am able to remove unwanted characters from start, but still one character is missing from end
>>> print t.encode('utf-16-le')
WAGCenter-04190517953516060503-20160605124857-4190-5
On trying for some other columns, it is working. What might be the cause of this intermittent issue?
You have a major problem in your database definition, in the way you store values in it, or in the way you read values from it. I can only explain what you are seeing, but neither why nor how to fix it without:
What you get is an ASCII string, where the 8 bits characters are grouped by pair to build 16 bit unicode characters in little endian order. As the expected string has an odd numbers of characters, the last character was (irremediably) lost in translation, because the original string ends with u'\352d'
where 0x2d is ASCII code for '-'
and 0x35 for '5'
. Demo:
def cvt(ustring):
l = []
for uc in ustring:
l.append(chr(ord(uc) & 0xFF)) # low order byte
l.append(chr((ord(uc) >> 8) & 0xFF)) # high order byte
return ''.join(l)
cvt(my_string)
'WAGCenter-04190517953516060503-20160605124857-4190-5'
The issue was, I was using UTF-16
in my odbcinst.ini
file where as I had to use UTF-8
format of character encoding.
Earlier I was changing it as an OPTION
parameter while making connection to PyODBC
. But later changing it in odbcinst.ini
file fixed the issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With