I tried to understand by myself encode
and decode
in Python but nothing is really clear for me.
str.encode([encoding,[errors]])
str.decode([encoding,[errors]])
First, I don't understand the need of the "encoding" parameter in these two functions.
What is the output of each function, its encoding? What is the use of the "encoding" parameter in each function? I don't really understand the definition of "bytes string".
I have an important question, is there some way to pass from one encoding to another? I have read some text on ASN.1 about "octet string", so I wondered whether it was the same as "bytes string".
Thanks for you help.
decode() is a method specified in Strings in Python 2. This method is used to convert from one encoding scheme, in which argument string is encoded to the desired encoding scheme. This works opposite to the encode. It accepts the encoding of the encoding string to decode it and returns the original string.
In the Python programming language, encoding represents a Unicode string as a string of bytes. This commonly occurs when you transfer an instance over a network or save it to a disk file. Decoding transforms a string of bytes into a Unicode string.
The encode() function in Python is responsible for returning the encoded form of any given string. The code points are translated into a series of bytes to efficiently store such strings. This process is defined as encoding. Python uses utf-8 as its encoding by default.
In Python 2, the str type was used for two different kinds of values – text and bytes, whereas in Python 3, these are separate and incompatible types. Text contains human-readable messages, represented as a sequence of Unicode codepoints. Usually, it does not contain unprintable control characters such as \0 .
It's a little more complex in Python 2 (compared to Python 3), since it conflates the concepts of 'string' and 'bytestring' quite a bit, but see The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets. Essentially, what you need to understand is that 'string' and 'character' are abstract concepts that can't be directly represented by a computer. A bytestring is a raw stream of bytes straight from disk (or that can be written straight from disk). encode
goes from abstract to concrete (you give it preferably a unicode string, and it gives you back a byte string); decode
goes the opposite way.
The encoding is the rule that says 'a' should be represented by the byte 0x61
and 'α' by the two-byte sequence 0xc0\xb1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With