I want to do this:
Take the bytes of this utf-8 string:
访视频
Encode those bytes in latin-1 and print the result:
访视频
How do I do this in Python?
# -*- coding: utf-8
s = u'访视频'.encode('latin-1')
Causes this exception:
s = u'访视频'.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256)
What you're asking to do is literally impossible. You can't encode those characters to Latin-1, because those characters don't exist in Latin-1.
To get the output you want, you want to decode the UTF-8 bytes as if they were Latin-1. Like this:
s = u'访视频'.encode('utf-8').decode('latin-1')
However, your desired output doesn't look like actual Latin-1, because in Latin-1, characters \x86 and \x91 are non-printable, so you're going to get this:
è®¿è§ é¢
(Notice that space in the middle in place of †, and the missing ‘ at the end; those are actually invisible control characters, not spaces.)
It looks like you want a Latin-1 superset, probably Windows codepage 1252. In which case what you really want is:
s = u'访视频'.encode('utf-8').decode('cp1252')
you need to first encode to UTF-8 (UTF-8 can encode any Unicode string) and yet fully compatible with the 7-bit ASCII set (any ASCII bytestring is a correct UTF-8–encoded string). :
>>> u'访视频'.encode('UTF-8').decode('latin-1')
u'\xe8\xae\xbf\xe8\xa7\x86\xe9\xa2\x91'
Note : The UTF-8 encoding can handle any Unicode character. It is also backwards
compatible with ASCII, so that a pure ASCII file can also be considered a UTF-8
file, and a UTF-8 file that happens to use only ASCII characters is identical to an
ASCII file with the same characters
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With