Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I don't understand encode and decode in Python (2.7.3)

I tried to understand by myself encode and decode in Python but nothing is really clear for me.

  1. str.encode([encoding,[errors]])
  2. str.decode([encoding,[errors]])

First, I don't understand the need of the "encoding" parameter in these two functions.

What is the output of each function, its encoding? What is the use of the "encoding" parameter in each function? I don't really understand the definition of "bytes string".

I have an important question, is there some way to pass from one encoding to another? I have read some text on ASN.1 about "octet string", so I wondered whether it was the same as "bytes string".

Thanks for you help.

like image 755
Narcisse Doudieu Siewe Avatar asked Jul 21 '12 23:07

Narcisse Doudieu Siewe


People also ask

How do you encode and decode in Python?

decode() is a method specified in Strings in Python 2. This method is used to convert from one encoding scheme, in which argument string is encoded to the desired encoding scheme. This works opposite to the encode. It accepts the encoding of the encoding string to decode it and returns the original string.

What does encoding and decoding mean in Python?

In the Python programming language, encoding represents a Unicode string as a string of bytes. This commonly occurs when you transfer an instance over a network or save it to a disk file. Decoding transforms a string of bytes into a Unicode string.

Why encode () is used in Python?

The encode() function in Python is responsible for returning the encoded form of any given string. The code points are translated into a series of bytes to efficiently store such strings. This process is defined as encoding. Python uses utf-8 as its encoding by default.

What is the difference between text encoding in Python 2 and Python 3?

In Python 2, the str type was used for two different kinds of values – text and bytes, whereas in Python 3, these are separate and incompatible types. Text contains human-readable messages, represented as a sequence of Unicode codepoints. Usually, it does not contain unprintable control characters such as \0 .


1 Answers

It's a little more complex in Python 2 (compared to Python 3), since it conflates the concepts of 'string' and 'bytestring' quite a bit, but see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets. Essentially, what you need to understand is that 'string' and 'character' are abstract concepts that can't be directly represented by a computer. A bytestring is a raw stream of bytes straight from disk (or that can be written straight from disk). encode goes from abstract to concrete (you give it preferably a unicode string, and it gives you back a byte string); decode goes the opposite way.

The encoding is the rule that says 'a' should be represented by the byte 0x61 and 'α' by the two-byte sequence 0xc0\xb1.

like image 166
lvc Avatar answered Sep 23 '22 15:09

lvc