Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string encoding and decoding?

Here are my attempts with error messages. What am I doing wrong?

string.decode("ascii", "ignore") 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 37: ordinal not in range(128)

string.encode('utf-8', "ignore") 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 37: ordinal not in range(128)

like image 729
waigani Avatar asked Jul 05 '12 07:07

waigani


People also ask

What is string encoding and decoding?

Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.

What is a string encoding?

In Java, when we deal with String sometimes it is required to encode a string in a specific character set. Encoding is a way to convert data from one format to another. String objects use UTF-16 encoding. The problem with UTF-16 is that it cannot be modified.

What is the difference between encoding and decoding?

Encoding is essentially a writing process, whereas decoding is a reading process. Encoding breaks a spoken word down into parts that are written or spelled out, while decoding breaks a written word into parts that are verbally spoken.

What is coding encoding and decoding?

In computers, encoding is the process of putting a sequence of characters (letters, numbers, punctuation, and certain symbols) into a specialized format for efficient transmission or storage. Decoding is the opposite process -- the conversion of an encoded format back into the original sequence of characters.


2 Answers

You can't decode a unicode, and you can't encode a str. Try doing it the other way around.

like image 70
Ignacio Vazquez-Abrams Avatar answered Sep 22 '22 03:09

Ignacio Vazquez-Abrams


Guessing at all the things omitted from the original question, but, assuming Python 2.x the key is to read the error messages carefully: in particular where you call 'encode' but the message says 'decode' and vice versa, but also the types of the values included in the messages.

In the first example string is of type unicode and you attempted to decode it which is an operation converting a byte string to unicode. Python helpfully attempted to convert the unicode value to str using the default 'ascii' encoding but since your string contained a non-ascii character you got the error which says that Python was unable to encode a unicode value. Here's an example which shows the type of the input string:

>>> u"\xa0".decode("ascii", "ignore")  Traceback (most recent call last):   File "<pyshell#7>", line 1, in <module>     u"\xa0".decode("ascii", "ignore") UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128) 

In the second case you do the reverse attempting to encode a byte string. Encoding is an operation that converts unicode to a byte string so Python helpfully attempts to convert your byte string to unicode first and, since you didn't give it an ascii string the default ascii decoder fails:

>>> "\xc2".encode("ascii", "ignore")  Traceback (most recent call last):   File "<pyshell#6>", line 1, in <module>     "\xc2".encode("ascii", "ignore") UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128) 
like image 40
Duncan Avatar answered Sep 19 '22 03:09

Duncan