Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load a pickle file containing a dictionary with unicode characters?

I have a dictionary:

mydict={'öö':1,'ää':2}

I have written it to a pickle file:

a=codecs.open(r'mydict.pkl', 'wb', 'utf-8')
pickle.dump(mydict, a)

If I try to load it:

m=codecs.open(r'mydict.pkl', 'rb', 'utf-8')
mydict = pickle.load(m)

I get an error:

KeyError: u"S'\\xe4\\xe4'\np1\nI2\nsS'\\xf6\\xf6'\np2\nI1\ns."

Any ideas how to solve this? Help is greatly appriciated.

like image 626
root Avatar asked Mar 19 '12 16:03

root


3 Answers

pickle is a binary format, using codec translations before writing will break it. Try to just write to a file and loading it back:

>>> mydict={'öö':1,'ää':2}
>>> mydict
{'\xc3\xb6\xc3\xb6': 1, '\xc3\xa4\xc3\xa4': 2}
>>> pickle.dump(mydict, open('/tmp/test.pkl', 'wb'))
>>> pickle.load(open('/tmp/test.pkl', 'rb'))
{'\xc3\xb6\xc3\xb6': 1, '\xc3\xa4\xc3\xa4': 2}

But most probably you want to use Unicode in the first place:

>>> mydict={u'öö':1,u'ää':2}
like image 88
Niklas B. Avatar answered Oct 03 '22 21:10

Niklas B.


I believe the problem is the use of codecs.open. Pickles are binaries not text and codec is for transparent conversion from some text encoding to unicode. You should just use open instead.

like image 32
Geoff Reedy Avatar answered Oct 03 '22 21:10

Geoff Reedy


Old issue but... I have had the same problem and I didn't think extra disk IO is a fine solution. I suggest you using base64 encode/decoding.

import base64

serialized_str = base64.b64encode(pickle.dumps(mydict))
my_obj_back = pickle.loads(base64.b64decode(serialized_str))

Even cPickle could be used same way for faster results in batches.

like image 26
JSBach Avatar answered Oct 03 '22 22:10

JSBach