Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on stackoverflow and other forums to handle encoding errors my script is throwing. My script reads data from all tables in a source MySQL DB, writes them in a python StringIO.StringIO object, and then loads that data from StringIO object to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.

I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).

I wrote all this code on the top of my Python script based on my research on the internet.

db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True) 
db_conn.set_character_set('utf8') 
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_conn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection=utf8;')

I still get the UnicodeEncodeError below with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value,

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.

cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value

Please help.

like image 548
user3422637 Avatar asked Oct 29 '14 21:10

user3422637


People also ask

How do I fix UnicodeEncodeError in Python?

Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.

How do I print Unicode characters in Python?

To print Unicode character in Python we can use the \u escape sequence. We can use the \u escape sequence to print Unicode character in Python. We can specify the code point with this sequence to display the character.


1 Answers

str(cell) is trying to convert cell to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?

If cell is a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8

...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...

You could also try,

cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")

or just use a 3rd party library. There is a good one that will fix text for you.

like image 158
Joran Beasley Avatar answered Nov 15 '22 00:11

Joran Beasley