I have code:
encoding = guess_encoding()
text = unicode(text, encoding)
when wrong symbol appears in text UnicodeDecode exception is raised. How can I silently skip exception replacing wrong symbol with '?' ?
Try
text = unicode(text, encoding, "replace")
From the documentation:
'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded.
If you want to use "?"
instead of the official Unicode replacement character, you can do
text = text.replace(u"\uFFFD", "?")
after converting to unicode.
In Python 3, you can decode a bytes
object into a string using the decode
method. It accepts two parameters:
encoding
, which is "utf-8"
by default, anderrors
, which defines what to do on illegal character sequences. The default value is "strict"
, which raises a UnicodeDecodeError
; other alternatives are ignore
and replace
-- the latter replaces illegal characters with the Unicode replacement character "\uFFFD"
.Therefore, you'd need to do this to decode-and-replace:
encoding = guess_encoding()
text = text_bytes.decode(encoding, errors='replace').replace('\uFFFD', '?')
As Sven Marnach pointed out in a comment, you can supply the errors
argument directly to open
; otherwise you'd get the decode errors while reading the file (if it falls out of the character map).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With