I'm a little confused. In Python what is the difference between a binary string, byte string, unicode string and a plain old string (str)? I'm using Python 2.6.
A character in a str represents one Unicode character. However, to represent more than 256 characters, individual Unicode encodings use more than one byte per character to represent many characters. bytes objects give you access to the underlying bytes.
Unicode, on the other hand, has tens of thousands of characters. That means that each Unicode character takes more than one byte, so you need to make the distinction between characters and bytes. Standard Python strings are really byte strings, and a Python character is really a byte.
bytes consists of sequences of 8-bit unsigned values, while str consists of sequences of Unicode code points that represent textual characters from human languages.
The most general difference is that non-binary strings have a character set and consist of characters in that character set, whereas binary strings consist simply of bytes that are distinguished only by their numeric values.
It depends on the version on Python you are using.
In Python 2.x if you write 'abc'
it has type str
but this means a byte string. If you want a Unicode string you must write u'abc'
.
In Python 3.x if you write 'abc'
it still has type str
but now this means that is a string of Unicode characters. If you want a byte string you must write b'abc'
. It is not allowed to write u'abc'
.
| 2.x | 3.x
--------+--------------------------+-----------------------
Bytes | 'abc' <type 'str'> | b'abc' <type 'bytes'>
Unicode | u'abc' <type 'unicode'> | 'abc' <type 'str'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With