make python replace un-encodable chars with a string by default

Tags:

I want to make python ignore chars it can't encode, by simply replacing them with the string "<could not encode>".

E.g, assuming the default encoding is ascii, the command

'%s is the word'%'ébác'

would yield

'<could not encode>b<could not encode>c is the word'

Is there any way to make this the default behavior, across all my project?

750

asked Dec 19 '09 15:12

olamundo

2 Answers

The str.encode function takes an optional argument defining the error handling:

str.encode([encoding[, errors]])

From the docs:

Return an encoded version of the string. Default encoding is the current default string encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a UnicodeError. Other possible values are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' and any other name registered via codecs.register_error(), see section Codec Base Classes. For a list of possible encodings, see section Standard Encodings.

In your case, the codecs.register_error function might be of interest.

[Note about bad chars]

By the way, note when using register_error that you'll likely find yourself replacing not just individual bad characters but groups of consecutive bad characters with your string, unless you pay attention. You get one call to the error handler per run of bad chars, not per char.

answered Nov 07 '22 02:11

miku

>>> help("".encode)
Help on built-in function encode:

encode(...)
S.encode([encoding[,errors]]) -> object

Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. **Other possible values are** 'ignore', **'replace'** and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that is able to handle UnicodeEncodeErrors.

So, for instance:

>>> x
'\xc3\xa9b\xc3\xa1c is the word'
>>> x.decode("ascii")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> x.decode("ascii", "replace")
u'\ufffd\ufffdb\ufffd\ufffdc is the word'

Add your own callback to codecs.register_error to replace with the string of your choice.

answered Nov 07 '22 03:11

J.J.

Related questions
                            
                                Updating Seaborn distplot code to version 0.11
                            
                                How to find string similar to 2 other strings (in terms of Levenshtein distance)?
                            
                                Python : Using class variables incorrectly?
                            
                                DeprecationWarning: firefox_profile has been deprecated, please pass in an Options object
                            
                                pytube: AttributeError: 'NoneType' object has no attribute 'span'
                            
                                Python port binding
                            
                                Is there a Python library that allows to build user interfaces without writing much code?
                            
                                Google App Engine: Intro to their Data Store API for people with SQL Background?
                            
                                Where can I find some "hello world"-simple Beautiful Soup examples?
                            
                                Writing with Python's built-in .csv module
                            
                                Is there any way to create a "project file" in Emacs?
                            
                                Python operators
                            
                                lucene / python
                            
                                Python: re.find longest sequence
                            
                                Best way to sort 1M records in Python
                            
                                Calling Python from Objective-C
                            
                                Install mysqldb on snow leopard
                            
                                How to set explicitly the terminal size when using pexpect
                            
                                How to write an efficient hit counter for websites
                            
                                python attribute lookup without any descriptor magic?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

make python replace un-encodable chars with a string by default

Tags:

python

replace

encode

olamundo

People also ask

2 Answers

miku

J.J.

Recent Activity

Donate For Us