Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DjangoUnicodeDecodeError while storing pickle'd data

I've got a simple dict object I'm trying to store in the database after it has been run through pickle. It seems that Django doesn't like trying to encode this error. I've checked with MySQL, and the query isn't even getting there before it is throwing the error, so I don't believe that is the problem. The dict I'm storing looks like this:

{
    'ordered': [
        {   'value': u'First\xd1ame Last\xd1ame',
            'label': u'Full Name' },
        {   'value': u'123-456-7890',
            'label': u'Phone Number' },
        {   'value': u'[email protected]',
            'label': u'Email Address' } ],
    'cleaned_data': {
        u'Phone Number': u'123-456-7890',
        u'Full Name': u'First\xd1ame Last\xd1ame',
        u'Email Address': u'[email protected]' },
    'post_data': <QueryDict: {
        u'Phone Number': [u'1234567890'],
        u'Full Name_1': [u'Last\xd1ame'],
        u'Full Name_0': [u'First\xd1ame'],
        u'Email Address': [u'[email protected]'] }>,
    'user': <User: itis>
}

The error that gets thrown is:

'utf8' codec can't decode bytes in position 52-53: invalid data.

Position 52-53 is the first instance of \xd1 (Ñ) in the pickled data.

So far, I've dug around StackOverflow and found a few questions where the database encoding for the objects was wrong. This doesn't help me because there is no MySQL query yet. This is happening before the database. Google also didn't help much when searching for unicode errors on pickled data.

It is probably worth mentioning that if I don't use the Ñ, this code works fine.

like image 1000
Jack M. Avatar asked Feb 28 '23 07:02

Jack M.


2 Answers

With much thanks to @prometheus, I found a solution for this. Basically you can use base64 to encode the output of pickle.dumps() before plugging it into the database. You would then turn around and use base64 to decode the output of the database before passing it to pickle.loads().

My code now looks like this:

## Put the information into the database:
self.raw_data = base64.b64encode(pickle.dumps(data))

## Get the information out of the database:
return pickle.loads(base64.b64decode(self.raw_data))

Again, thank you @prometheus.

like image 114
Jack M. Avatar answered Mar 12 '23 13:03

Jack M.


That's a known problem, and there was a discussion about this on the Python bug-tracker:

I ran into this problem today when writing python data structures into a database. Only ASCII is safe in this situation. I understood the Python docs that protocol 0 was ASCII-only.

I use pickle+base64 now, however, this makes debugging more difficult.

Anyway, I think that the docs should clearly say that protocol 0 is not ASCII-only because this is important in the Python world. For example, I saw this issue because Django makes an implicit unicode() conversion with my input which fails with non-ASCII.

like image 33
nikola Avatar answered Mar 12 '23 12:03

nikola