Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fool proof way to convert some string (utf-8 or else) to a simple ASCII string in python

Inside my python scrip, I get some string back from a function which I didn't write. The encoding of it varies. I need to convert it to ascii format. Is there some fool-proof way of doing this? I don't mind replacing the non-ascii chars with blanks or something else...

like image 237
olamundo Avatar asked Nov 24 '09 20:11

olamundo


People also ask

What is encoding =' UTF-8 in Python?

UTF-8: It uses 1, 2, 3 or 4 bytes to encode every code point. It is backwards compatible with ASCII. All English characters just need 1 byte — which is quite efficient. We only need more bytes if we are sending non-English characters. It is the most popular form of encoding, and is by default the encoding in Python 3.

Is UTF-8 and ASCII same?

For backward compatibility, the first 128 Unicode characters point to ASCII characters. And since UTF-8 encodes each of those characters using 1-byte. ASCII is essentially just UTF-8, or we can say that ASCII is a subset of Unicode.

How do I decode a UTF-8 string in Python?

To decode a string encoded in UTF-8 format, we can use the decode() method specified on strings. This method accepts two arguments, encoding and error . encoding accepts the encoding of the string to be decoded, and error decides how to handle errors that arise during decoding.


1 Answers

If you want an ASCII string that unambiguously represents what you have got, without losing any information, the answer is simple:

Don't muck about with encode/decode, use the repr() function (Python 2.X) or the ascii() function (Python 3.x).

like image 131
John Machin Avatar answered Oct 13 '22 06:10

John Machin