python3 encode replace unicode characters

Question

According to the documentation, the following command

'Brückenspinne'.encode("utf-8",errors='replace')

should give me the byte sequenceb'Br??ckenspinne'. However, unicode characters are not replaced but encoded nevertheless:

b'Br\xc3\xbcckenspinne'

Can you tell me how I actually eliminate Unicode characters? (I use replace for testing purposes, I intend to use 'xmlcharrefreplace' later. To be totally honest, I want to convert the unicode characters to their xmlcharref, keeping everything as a string).

falsetru · Accepted Answer

utf-8 encoding can represent the character ü; there is no error, so no replacement occurs.

To see errors='replace' in action, use another encoding that cannot represent the character. For example ascii:

>>> 'Brückenspinne'.encode("ascii", errors='replace')
b'Br?ckenspinne'

>>> 'Brückenspinne'.encode("ascii", errors='xmlcharrefreplace')
b'Br&#252;ckenspinne'

python3 encode replace unicode characters

Tags:

python

python-3.x

unicode

python-unicode

Lærne

1 Answers

falsetru

Recent Activity

Donate For Us

python3 encode replace unicode characters

Tags:

python

python-3.x

unicode

python-unicode

Lærne

1 Answers

falsetru

Related questions

Recent Activity

Donate For Us