Apparently, the following is the valid syntax:
my_string = b'The string' I would like to know:
b character in front of the string mean?I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don't think this applies to Python.
I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn't mention the b character anywhere in that document.
Also, just out of curiosity, are there more symbols than the b and u that do other things?
The b denotes a byte string. Bytes are the actual data.
The b" notation is used to specify a bytes string in Python. Compared to the regular strings, which have ASCII characters, the bytes string is an array of byte variables where each hexadecimal element has a value between 0 and 255.
The b prefix signifies a bytes string literal. If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object.
The second string is prefixed with the 'b,' which says that it produces byte data type instead of the string data type. Hence, you can see the output.
To quote the Python 2.x documentation:
A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.
The Python 3 documentation states:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
Python 3.x makes a clear distinction between the types:
str = '...' literals = a sequence of Unicode characters (Latin-1, UCS-2 or UCS-4, depending on the widest character in the string)bytes = b'...' literals = a sequence of octets (integers between 0 and 255)If you're familiar with:
str as String and bytes as byte[];str as NVARCHAR and bytes as BINARY or BLOB;str as REG_SZ and bytes as REG_BINARY.If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.
You use str when you want to represent text.
print('שלום עולם') You use bytes when you want to represent low-level binary data like structs.
NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0] You can encode a str to a bytes object.
>>> '\uFEFF'.encode('UTF-8') b'\xef\xbb\xbf' And you can decode a bytes into a str.
>>> b'\xE2\x82\xAC'.decode('UTF-8') '€' But you can't freely mix the two types.
>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't concat bytes to str The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.
>>> b'A' == b'\x41' True But I must emphasize, a character is not a byte.
>>> 'A' == b'A' False Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:
unicode = u'...' literals = sequence of Unicode characters = 3.x str str = '...' literals = sequences of confounded bytes/characters struct.pack output.In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.
So yes, b'...' literals in Python have the same purpose that they do in PHP.
Also, just out of curiosity, are there more symbols than the b and u that do other things?
The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With