Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 - Encode/Decode vs Bytes/Str [duplicate]

I am new to python3, coming from python2, and I am a bit confused with unicode fundamentals. I've read some good posts, that made it all much clearer, however I see there are 2 methods on python 3, that handle encoding and decoding, and I'm not sure which one to use.

So the idea in python 3 is, that every string is unicode, and can be encoded and stored in bytes, or decoded back into unicode string again.

But there are 2 ways to do it:
u'something'.encode('utf-8') will generate b'something', but so does bytes(u'something', 'utf-8').
And b'bytes'.decode('utf-8') seems to do the same thing as str(b'bytes', 'utf-8').

Now my question is, why are there 2 methods that seem to do the same thing, and is either better than the other (and why?) I've been trying to find answer to this on google, but no luck.

>>> original = '27岁少妇生孩子后变老' >>> type(original) <class 'str'> >>> encoded = original.encode('utf-8') >>> print(encoded) b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81' >>> type(encoded) <class 'bytes'> >>> encoded2 = bytes(original, 'utf-8') >>> print(encoded2) b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81' >>> type(encoded2) <class 'bytes'> >>> print(encoded+encoded2) b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x8127\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81' >>> decoded = encoded.decode('utf-8') >>> print(decoded) 27岁少妇生孩子后变老 >>> decoded2 = str(encoded2, 'utf-8') >>> print(decoded2) 27岁少妇生孩子后变老 >>> type(decoded) <class 'str'> >>> type(decoded2) <class 'str'> >>> print(str(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81', 'utf-8')) 27岁少妇生孩子后变老 >>> print(b'27\xe5\xb2\x81\xe5\xb0\x91\xe5\xa6\x87\xe7\x94\x9f\xe5\xad\xa9\xe5\xad\x90\xe5\x90\x8e\xe5\x8f\x98\xe8\x80\x81'.decode('utf-8')) 27岁少妇生孩子后变老 
like image 635
if __name__ is None Avatar asked Jan 23 '13 04:01

if __name__ is None


People also ask

What is the difference between encode and decode Python?

To represent a unicode string as a string of bytes is known as encoding. To convert a string of bytes to a unicode string is known as decoding.

What is the difference between decode and encode?

Decoding is the process of reading words in text. When a child reads the words 'The ball is big,' for example, it is necessary to understand what the letters are, the sounds made by each letter and how they blend together to create words. Encoding is the process of using letter/sound knowledge to write.

What is the default encoding for bytes decode () in Python 3?

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.

What is the difference between text encoding in Python 2 and Python 3?

In Python 2, the str type was used for two different kinds of values – text and bytes, whereas in Python 3, these are separate and incompatible types. Text contains human-readable messages, represented as a sequence of Unicode codepoints. Usually, it does not contain unprintable control characters such as \0 .


2 Answers

Neither is better than the other, they do exactly the same thing. However, using .encode() and .decode() is the more common way to do it. It is also compatible with Python 2.

like image 113
Lennart Regebro Avatar answered Sep 24 '22 21:09

Lennart Regebro


To add to Lennart Regebro's answer There is even the third way that can be used:

encoded3 = str.encode(original, 'utf-8') print(encoded3) 

Anyway, it is actually exactly the same as the first approach. It may also look that the second way is a syntactic sugar for the third approach.


A programming language is a means to express abstract ideas formally, to be executed by the machine. A programming language is considered good if it contains constructs that one needs. Python is a hybrid language -- i.e. more natural and more versatile than pure OO or pure procedural languages. Sometimes functions are more appropriate than the object methods, sometimes the reverse is true. It depends on mental picture of the solved problem.

Anyway, the feature mentioned in the question is probably a by-product of the language implementation/design. In my opinion, this is a nice example that show the alternative thinking about technically the same thing.

In other words, calling an object method means thinking in terms "let the object gives me the wanted result". Calling a function as the alternative means "let the outer code processes the passed argument and extracts the wanted value".

The first approach emphasizes the ability of the object to do the task on its own, the second approach emphasizes the ability of an separate algoritm to extract the data. Sometimes, the separate code may be that much special that it is not wise to add it as a general method to the class of the object.

like image 30
pepr Avatar answered Sep 23 '22 21:09

pepr