Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeDecodeError: 'ascii' codec can't decode byte in position : ordinal not in range(128)

I have done a bit of research on this error and can't really get my head around what's going on. As far as I understand I am basically having problems because I am converting from one type of encoding to another.

def write_table_to_file(table, connection):
    db_table = io.StringIO()
    cur = connection.cursor()
    #pdb.set_trace()
    cur.copy_to(db_table, table)
    cur.close()
    return db_tabl

This is the method that is giving me head aches. The below error is output when I run this method

[u350932@config5290vm0 python3]$ python3 datamain.py 
Traceback (most recent call last):
  File "datamain.py", line 48, in <module>
    sys.exit(main())
  File "datamain.py", line 40, in main
    t = write_table_to_file("cms_jobdef", con_tctmsv64)
  File "datamain.py", line 19, in write_table_to_file
    cur.copy_to(db_table, table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 40: ordinal not in range(128)

The client encoding on the database im retrieving the table from is

tctmsv64=> SHOW CLIENT_ENCODING;
 client_encoding
-----------------
 sql_ascii
(1 row)

The database encoding is LATIN1

The encoding for the database I am putting them onto is

S104838=# SHOW CLIENT_ENCODING;
 client_encoding
-----------------
 WIN1252
(1 row)

The database encoding is UTF8

From the threads I have found they recommend to change the encoding

To correct your function, you'll have to know what encoding the byte
string is in, and convert it to unicode using the decode() method,
and compare that result to the unicode string.

http://www.thecodingforums.com/threads/unicodedecodeerror-ascii-codec-cant-decode-byte-0xa0-in-position-10-ordinal-not-in-range-128.336691/

The problem is when I try and use the decode methods I get complaints that its not a file type. I have had a look at the python 3.4 methods for class io.StringIO(initial_value='', newline='\n')¶ method but could not find anything on changing the encoding.

I also found this page which outlined the problem but I couldn't figure out what I needed to do to solve it

https://wiki.python.org/moin/UnicodeDecodeError

Basically I'm quite confused as to what is going on and not sure how to fix it. Any help would be greatly appreciated.

Cheers

like image 572
Justin S Avatar asked Sep 30 '22 11:09

Justin S


2 Answers

Python 3 changed file I/O behaviour around text encodings - massively for the better, IMO. You may find Processing Text Files in Python 3 informative.

It looks like psycopg2 is seeing that you passed a raw file object and is trying to encode the strings it's working with into byte sequences for writing to the file, with the assumption (since you didn't specify anything else) that you want to use the ascii encoding for the file.

I'd use an io.BytesIO object instead of StringIO, and specify the source encoding when you do a copy_from into the new database.

I'll be surprised if you don't have problems due to invalid, mixed, or otherwise borked text from your SQL_ASCII source database, though.

like image 89
Craig Ringer Avatar answered Dec 16 '22 09:12

Craig Ringer


First of thanks Craig for your response. It was very helpful in making me realise that I needed to find a good way of doing this otherwise the data in my new database would be corrupt. Not something we want! After a bit more googling this link was very useful

https://docs.python.org/3/howto/unicode.html

I ended up using the StreamRecorder module and it works very well. Below is a snippet of my working code

def write_table_to_file(table, connection):
    db_table = io.BytesIO()
    cur = connection.cursor()
    cur.copy_to(codecs.StreamRecoder(db_table,codecs.getencoder('utf-8'), codecs.getdecoder('latin-1'),
                                     codecs.getreader('utf-8'), codecs.getwriter('utf-8')), table)
    cur.close()
    return db_table

Long story short I convert from latin-1 to utf-8 on the fly and it all works and my data looks good. Thanks again for the feedback Craig :)

like image 38
Justin S Avatar answered Dec 16 '22 10:12

Justin S