<pre class="prettyprint"><code>Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> str_version = 'នយោបាយ' >>> type(str_version) <class 'str'> >>> print (str_version) នយោបាយ >>> unicode_version = 'នយោបាយ'.decode('utf-8') Traceback (most recent call last): File "<pyshell#3>", line 1, in <module> unicode_version = 'នយោបាយ'.decode('utf-8') AttributeError: 'str' object has no attribute 'decode' >>> </code></pre> What the problem with my unicode string?

There is nothing wrong with your string! You just have confused <code>encode()</code> and <code>decode()</code>. The string is meaningful symbols. To turn it into bytes that could be stored in a file or transmitted over the Internet, use <code>encode()</code> with an encoding like UTF-8. Each encoding is a scheme for converting meaningful symbols to flat bytes of output. When the time comes to do the opposite — to take some raw bytes from a file or a socket and turn them into symbols like letters and numbers — you will decode the bytes using the <code>decode()</code> method of bytestrings in Python 3. <pre class="prettyprint"><code>>>> str_version = 'នយោបាយ' >>> str_version.encode('utf-8') b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99' </code></pre> See that big long line of bytes? Those are the bytes that UTF-8 uses to represent your string, if you need to transmit the string over a network, or store them in a document. There are many other encodings in use, but it seems to be the most popular. Each encoding can turn meaningful symbols like ន and យោ into bytes — the little 8-bit numbers with which computers communicate. <pre class="prettyprint"><code>>>> rawbytes = str_version.encode('utf-8') >>> rawbytes b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99' >>> rawbytes.decode('utf-8') 'នយោបាយ' </code></pre>

You're reading the 2.x docs. <code>str.decode()</code> (and <code>bytes.encode()</code>) was dropped in 3.x. And <code>str</code> is already a Unicode string; there's no need to decode it.

String In python with my unicode?

Tags:

python

python-3.x

unicode

Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> str_version = 'នយោបាយ'
>>> type(str_version)
<class 'str'>
>>> print (str_version)
នយោបាយ
>>> unicode_version = 'នយោបាយ'.decode('utf-8')
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    unicode_version = 'នយោបាយ'.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'
>>>

What the problem with my unicode string?

837

asked Mar 26 '11 20:03

kn3l

2 Answers

There is nothing wrong with your string! You just have confused encode() and decode(). The string is meaningful symbols. To turn it into bytes that could be stored in a file or transmitted over the Internet, use encode() with an encoding like UTF-8. Each encoding is a scheme for converting meaningful symbols to flat bytes of output.

When the time comes to do the opposite — to take some raw bytes from a file or a socket and turn them into symbols like letters and numbers — you will decode the bytes using the decode() method of bytestrings in Python 3.

>>> str_version = 'នយោបាយ'
>>> str_version.encode('utf-8')
b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'

See that big long line of bytes? Those are the bytes that UTF-8 uses to represent your string, if you need to transmit the string over a network, or store them in a document. There are many other encodings in use, but it seems to be the most popular. Each encoding can turn meaningful symbols like ន and យោ into bytes — the little 8-bit numbers with which computers communicate.

>>> rawbytes = str_version.encode('utf-8')
>>> rawbytes
b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'
>>> rawbytes.decode('utf-8')
'នយោបាយ'

answered Sep 21 '22 13:09

Brandon Rhodes

You're reading the 2.x docs. str.decode() (and bytes.encode()) was dropped in 3.x. And str is already a Unicode string; there's no need to decode it.

answered Sep 20 '22 13:09

Ignacio Vazquez-Abrams

Related questions
                            
                                Is there any regular expression engine that does Just-In-Time compiling? [closed]
                            
                                Are there more ways to define a tuple with only one item?
                            
                                Should a Python generator raise an exception when there are no more elements to yield?
                            
                                SQLite or flat text file?
                            
                                Add Keyboard Binding To Existing Emacs Mode
                            
                                X11 - How to raise another application's window using Python
                            
                                Django: How can I get a block from a template?
                            
                                Validating an XMPP jid with python?
                            
                                Python library for Amazon MWS
                            
                                Understanding objects in Python
                            
                                Find the closest hour
                            
                                defaultdict with a parameter to the class constructor
                            
                                Python: variable-length tuples
                            
                                How to raise exception if None value encountered in dict?
                            
                                can I put my sqlite connection and cursor in a function?
                            
                                Django: DatabaseError column does not exist
                            
                                Cassandra low performance?
                            
                                How do I stream a file using werkzeug?
                            
                                PIL: enlarge an image
                            
                                Column default value persisted to the table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With