Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python with MySql unicode problems

I need to call MySQL stored procedure from my python script. As one of parameters I'm passing a unicode string (Russian language), but I get an error;

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

My script:

  self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName")
  self.cursor=self.db.cursor()
  args=("какой-то текст") #this is string in russian
  self.cursor.callproc('pr_MyProc', args)
  self.cursor.execute('SELECT @_pr_MyProc_2') #getting result from sp
  result=self.cursor.fetchone()
  self.db.commit()

I've read that setting charset='utf8' shuld resolve this problem, but when I use string:

self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName", charset='utf8')

This gives me another error;

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' in position 20: surrogates not allowed

Also I've trying to set parametr use_unicode=True, that's not working.

like image 399
Gleb Avatar asked Aug 23 '16 06:08

Gleb


2 Answers

More things to check on: http://mysql.rjweb.org/doc.php/charcoll#python

Likely items:

  • Start code file with # -*- coding: utf-8 -*- -- (for literals in code)
  • Literals should be u'...'

Can you extract the HEX? какой-то текст should be this in utf8: D0BA D0B0 D0BA D0BE D0B9 2D D182 D0BE D182 20 D0B5 D0BA D181 D182

like image 165
Rick James Avatar answered Oct 05 '22 23:10

Rick James


Here are some thoughts. Maybe not a response. I've been playing with python/mysql/utf-8/unicode in the past and this is the things i remember:

Looking at Saltstack mysql module's comment :

https://github.com/saltstack/salt/blob/develop/salt/modules/mysql.py#L314-L322

# MySQLdb states that this is required for charset usage
# but in fact it's more than it's internally activated
# when charset is used, activating use_unicode here would
# retrieve utf8 strings as unicode() objects in salt
# and we do not want that.
#_connarg('connection_use_unicode', 'use_unicode')
connargs['use_unicode'] = False
_connarg('connection_charset', 'charset')

We see that to avoid altering the result string the use_unicode is set to False, while the charset (which could be utf-8) is set as a parameter. use_unicode is more a 'request' to get responses as unicode strings.

You can check real usage in the tests, here: https://github.com/saltstack/salt/blob/develop/tests/integration/modules/test_mysql.py#L311-L361 with a database named '標準語'.

Now about the message UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' **. You are using **unicode but you tell the module it is utf-8. It is not utf-8 until you encode your unicode string in utf-8.

Maybe you should try with:

args=(u"какой-то текст".encode('utf-8'))

At least in python3 this is required, because your "какой-то текст" is not in utf-8 by default.

like image 34
regilero Avatar answered Oct 06 '22 00:10

regilero