Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens when encode is used on str in python?

I got the point about unicode, encoding and decoding. But I don't understand why the encode function works on str type. I expected it to work only on unicode type. Therefore my question is : what is the behavior of encode when it's used on a str rather than unicode ?

like image 898
Ali Baba Avatar asked Feb 26 '16 21:02

Ali Baba


1 Answers

In Python 2 there are two types of codecs available; those that convert between str and unicode, and those that convert from str to str. Examples of the latter are the base64 and rot13 codecs.

The str.encode() method exists to support the latter:

'binary data'.encode('base64')

But now that it exists, people are also using it for the unicode -> str codecs; encoding can only go from unicode to str (and decoding the other way). To support these, Python will implicitly decode your str value to unicode first, using the ASCII codec, before finally encoding.

Incidentally, when using a str -> str codec on a unicode object, Python first implicitly encodes to str using the same ASCII codec.

In Python 3, this has been solved by a) removing the bytes.encode() and str.decode() methods (remember that bytes is sorta the old str and str the new unicode), and b) by moving the str -> str encodings to the codecs module only, using the codecs.encode() and codecs.decode() functions. What codecs transform between the same type has also been clarified and updated, see the Python Specific Encodings section; note that the 'text' encodings noted there, where available in Python 2, encode to str instead.

like image 188
Martijn Pieters Avatar answered Oct 03 '22 01:10

Martijn Pieters