I feel stacked here trying to change encodings with Python 2.5
I have XML response, which I encode to UTF-8: response.encode('utf-8')
. That is fine, but the program which uses this info doesn't like this encoding and I have to convert it to other code page. Real example is that I use ghostscript python module to embed pdfmark data to a PDF file - end result is with wrong characters in Acrobat.
I've done numerous combinations with .encode()
and .decode()
between 'utf-8' and 'latin-1' and it drives me crazy as I can't output correct result.
If I output the string to a file with .encode('utf-8')
and then convert this file from UTF-8 to CP1252 (aka latin-1) with i.e. iconv.exe and embed the data everything is fine.
Basically can someone help me convert i.e. character á which is UTF-8 encoded as hex: C3 A1
to latin-1 as hex: E1
?
Thanks in advance
Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.
To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.
Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.
Instead of .encode('utf-8')
, use .encode('latin-1')
.
data="UTF-8 data"
udata=data.decode("utf-8")
data=udata.encode("latin-1","ignore")
Should do it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With