Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between binary string, byte string, unicode string and an ordinary string (str)

Tags:

python

I'm a little confused. In Python what is the difference between a binary string, byte string, unicode string and a plain old string (str)? I'm using Python 2.6.

like image 471
Imran Azad Avatar asked Nov 21 '11 22:11

Imran Azad


People also ask

What is the difference between byte string and Unicode string?

A character in a str represents one Unicode character. However, to represent more than 256 characters, individual Unicode encodings use more than one byte per character to represent many characters. bytes objects give you access to the underlying bytes.

What is the difference between Unicode string and string?

Unicode, on the other hand, has tens of thousands of characters. That means that each Unicode character takes more than one byte, so you need to make the distinction between characters and bytes. Standard Python strings are really byte strings, and a Python character is really a byte.

What is the difference between STR and bytes?

bytes consists of sequences of 8-bit unsigned values, while str consists of sequences of Unicode code points that represent textual characters from human languages.

What is the difference between binary and string?

The most general difference is that non-binary strings have a character set and consist of characters in that character set, whereas binary strings consist simply of bytes that are distinguished only by their numeric values.


1 Answers

It depends on the version on Python you are using.

In Python 2.x if you write 'abc' it has type str but this means a byte string. If you want a Unicode string you must write u'abc'.

In Python 3.x if you write 'abc' it still has type str but now this means that is a string of Unicode characters. If you want a byte string you must write b'abc'. It is not allowed to write u'abc'.

        |  2.x                     |  3.x
--------+--------------------------+-----------------------
Bytes   |  'abc' <type 'str'>      |  b'abc' <type 'bytes'>
Unicode | u'abc' <type 'unicode'>  |   'abc' <type 'str'>
like image 120
Mark Byers Avatar answered Sep 28 '22 10:09

Mark Byers