Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I make decode(errors="ignore") the default for all strings in a Python 2.7 program?

I have a Python 2.7 program that writes out data from various external applications. I continually get bitten by exceptions when I write to a file until I add .decode(errors="ignore") to the string being written out. (FWIW, opening the file as mode="wb" doesn't fix this.)

Is there a way to say "ignore encoding errors on all strings in this scope"?

like image 349
Paul Hoffman Avatar asked Mar 02 '12 17:03

Paul Hoffman


2 Answers

As mentioned in my thread on the issue the hack from Sven Marnach is even possible without a new function:

import codecs
codecs.register_error("strict", codecs.ignore_errors)
like image 180
phk Avatar answered Oct 07 '22 13:10

phk


You cannot redefine methods on built-in types, and you cannot change the default value of the errors parameter to str.decode(). There are other ways to achieve the desired behaviour, though.

The slightly nicer way: Define your own decode() function:

def decode(s, encoding="ascii", errors="ignore"):
    return s.decode(encoding=encoding, errors=errors)

Now, you will need to call decode(s) instead of s.decode(), but that's not too bad, isn't it?

The hack: You can't change the default value of the errors parameter, but you can overwrite what the handler for the default errors="strict" does:

import codecs
def strict_handler(exception):
    return u"", exception.end
codecs.register_error("strict", strict_handler)

This will essentially change the behaviour of errors="strict" to the standard "ignore" behaviour. Note that this will be a global change, affecting all modules you import.

I recommend neither of these two ways. The real solution is to get your encodings right. (I'm well aware that this isn't always possible.)

like image 23
Sven Marnach Avatar answered Oct 07 '22 14:10

Sven Marnach