I am trying to push user account data from an Active Directory to our MySQL-Server. This works flawlessly but somehow the strings end up showing an encoded version of umlauts and other special characters.
The Active Directory returns a string using this sample format: M\xc3\xbcller
This actually is the UTF-8 encoding for Müller
, but I want to write Müller
to my database not M\xc3\xbcller
.
I tried converting the string with this line, but it results in the same string in the database: tempEntry[1] = tempEntry[1].decode("utf-8")
If I run print "M\xc3\xbcller".decode("utf-8")
in the python console the output is correct.
Is there any way to insert this string the right way? I need this specific format for a web developer who wants to have this exact format, I don't know why he is not able to convert the string using PHP directly.
Additional info: I am using MySQLdb; The table and column encoding is utf8_general_ci
As @marr75 suggests, make sure you set charset='utf8'
on your connections. Setting use_unicode=True
is not strictly necessary as it is implied by setting the charset.
Then make sure you are passing unicode objects to your db connection as it will encode it using the charset you passed to the cursor. If you are passing a utf8-encoded string, it will be doubly encoded when it reaches the database.
So, something like:
conn = MySQLdb.connect(host="localhost", user='root', password='', db='', charset='utf8') data_from_ldap = 'M\xc3\xbcller' name = data_from_ldap.decode('utf8') cursor = conn.cursor() cursor.execute(u"INSERT INTO mytable SET name = %s", (name,))
You may also try forcing the connection to use utf8 by passing the init_command param, though I'm unsure if this is required. 5 mins testing should help you decide.
conn = MySQLdb.connect(charset='utf8', init_command='SET NAMES UTF8')
Also, and this is barely worth mentioning as 4.1 is so old, make sure you are using MySQL >= 4.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With