I have a code such that: <pre class="prettyprint"><code>a = "\u0432" b = u"\u0432" c = b"\u0432" d = c.decode('utf8') print(type(a), a) print(type(b), b) print(type(c), c) print(type(d), d) </code></pre> And output: <pre class="prettyprint"><code><class 'str'> в <class 'str'> в <class 'bytes'> b'\\u0432' <class 'str'> \u0432 </code></pre> Why in the latter case I see a character code, instead of the character? How I can transform Byte string to Unicode string that in case of an output I saw the character, instead of its code?

In strings (or Unicode objects in Python 2), <code>\u</code> has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence <code>u"\u0432"</code> will result in the character в. The <code>b''</code> prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the <code>\u</code> code has no special meaning. Hence, <code>b"\u0432"</code> is just the sequence of the bytes <code>\</code>,<code>u</code>,<code>0</code>,<code>4</code>,<code>3</code> and <code>2</code>. Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character. You can convert this specification using the unicode escape encoder. <pre class="prettyprint"><code>>>> c.decode('unicode_escape') 'в' </code></pre>

Converting byte string in unicode string

Tags:

python

string

type-conversion

python-3.x

unicode

I have a code such that:

a = "\u0432" b = u"\u0432" c = b"\u0432" d = c.decode('utf8')  print(type(a), a) print(type(b), b) print(type(c), c) print(type(d), d)

And output:

<class 'str'> в <class 'str'> в <class 'bytes'> b'\\u0432' <class 'str'> \u0432

Why in the latter case I see a character code, instead of the character? How I can transform Byte string to Unicode string that in case of an output I saw the character, instead of its code?

880

asked Dec 12 '12 10:12

Alex T

1 Answers

In strings (or Unicode objects in Python 2), \u has a special meaning, namely saying, "here comes a Unicode character specified by it's Unicode ID". Hence u"\u0432" will result in the character в.

The b'' prefix tells you this is a sequence of 8-bit bytes, and bytes object has no Unicode characters, so the \u code has no special meaning. Hence, b"\u0432" is just the sequence of the bytes \,u,0,4,3 and 2.

Essentially you have an 8-bit string containing not a Unicode character, but the specification of a Unicode character.

You can convert this specification using the unicode escape encoder.

>>> c.decode('unicode_escape') 'в'

104

answered Oct 11 '22 16:10

Lennart Regebro

Related questions
                            
                                Check Linux distribution name
                            
                                Is this the way to validate Django model fields?
                            
                                Flask confusion with app
                            
                                Python TypeError: unsupported operand type(s) for ^: 'float' and 'int'
                            
                                Python frequency detection
                            
                                Python: Elegantly merge dictionaries with sum() of values [duplicate]
                            
                                How to construct a TarFile object in memory from byte buffer in Python 3?
                            
                                How to send an email through gmail without enabling 'insecure access'?
                            
                                Numpy - the best way to remove the last element from 1 dimensional array?
                            
                                python struct.error: 'i' format requires -2147483648 <= number <= 2147483647
                            
                                How come I can add the boolean value False but not True in a set in Python? [duplicate]
                            
                                how to use 'extent' in matplotlib.pyplot.imshow
                            
                                Give the Python Terminal a Persistent History
                            
                                python matplotlib dash-dot-dot - how to?
                            
                                Python joining current directory and parent directory with os.path.join
                            
                                Can't modify list elements in a loop [duplicate]
                            
                                Flask request.args vs request.form
                            
                                looking for source code of from gen_nn_ops in tensorflow
                            
                                Recommendation for python form validation library [closed]
                            
                                How to nest records in an Avro schema?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With