According to the documentation, the following command
'Brückenspinne'.encode("utf-8",errors='replace')
should give me the byte sequenceb'Br??ckenspinne'. However, unicode characters are not replaced but encoded nevertheless:
b'Br\xc3\xbcckenspinne'
Can you tell me how I actually eliminate Unicode characters? (I use replace for testing purposes, I intend to use 'xmlcharrefreplace' later. To be totally honest, I want to convert the unicode characters to their xmlcharref, keeping everything as a string).
utf-8 encoding can represent the character ü; there is no error, so no replacement occurs.
To see errors='replace' in action, use another encoding that cannot represent the character. For example ascii:
>>> 'Brückenspinne'.encode("ascii", errors='replace')
b'Br?ckenspinne'
>>> 'Brückenspinne'.encode("ascii", errors='xmlcharrefreplace')
b'Brückenspinne'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With