Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: convert string from UTF-8 to Latin-1

Tags:

I feel stacked here trying to change encodings with Python 2.5

I have XML response, which I encode to UTF-8: response.encode('utf-8'). That is fine, but the program which uses this info doesn't like this encoding and I have to convert it to other code page. Real example is that I use ghostscript python module to embed pdfmark data to a PDF file - end result is with wrong characters in Acrobat.

I've done numerous combinations with .encode() and .decode() between 'utf-8' and 'latin-1' and it drives me crazy as I can't output correct result.

If I output the string to a file with .encode('utf-8') and then convert this file from UTF-8 to CP1252 (aka latin-1) with i.e. iconv.exe and embed the data everything is fine.

Basically can someone help me convert i.e. character á which is UTF-8 encoded as hex: C3 A1 to latin-1 as hex: E1?

Thanks in advance

like image 223
romor Avatar asked Nov 28 '10 23:11

romor


People also ask

How do I change the encoding of a string in Python?

Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.

How do I decode a UTF-8 string in Python?

To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.

What does decode () do in Python?

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.


2 Answers

Instead of .encode('utf-8'), use .encode('latin-1').

like image 184
Ignacio Vazquez-Abrams Avatar answered Sep 21 '22 09:09

Ignacio Vazquez-Abrams


data="UTF-8 data"
udata=data.decode("utf-8")
data=udata.encode("latin-1","ignore")

Should do it.

like image 28
Utku Zihnioglu Avatar answered Sep 19 '22 09:09

Utku Zihnioglu