Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python3 encode replace unicode characters

According to the documentation, the following command

'Brückenspinne'.encode("utf-8",errors='replace')

should give me the byte sequenceb'Br??ckenspinne'. However, unicode characters are not replaced but encoded nevertheless:

b'Br\xc3\xbcckenspinne'

Can you tell me how I actually eliminate Unicode characters? (I use replace for testing purposes, I intend to use 'xmlcharrefreplace' later. To be totally honest, I want to convert the unicode characters to their xmlcharref, keeping everything as a string).

like image 487
Lærne Avatar asked Apr 08 '26 12:04

Lærne


1 Answers

utf-8 encoding can represent the character ü; there is no error, so no replacement occurs.

To see errors='replace' in action, use another encoding that cannot represent the character. For example ascii:

>>> 'Brückenspinne'.encode("ascii", errors='replace')
b'Br?ckenspinne'

>>> 'Brückenspinne'.encode("ascii", errors='xmlcharrefreplace')
b'Brückenspinne'
like image 142
falsetru Avatar answered Apr 11 '26 02:04

falsetru



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!