Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are Python's bytes objects also known as strings?

Tags:

python

string

This is a section from Dive Into Python 3 regarding strings:

In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.

Earlier today I used the hashlib module and read the help text for md5 that says:

Return a new MD5 hash object; optionally initialized with a string.

Well, it doesn't accept a string - it accepts a bytes object.

Maybe I'm reading too much into this, but wouldn't it make more sense if the help text stated a bytes should be used instead? Or are people using the same name for strings and bytes?

like image 239
roqvist Avatar asked Aug 11 '11 20:08

roqvist


People also ask

What is Python byte string?

In Python, a byte string is just that: a sequence of bytes. It isn't human-readable. Under the hood, everything must be converted to a byte string before it can be stored in a computer. On the other hand, a character string, often just called a "string", is a sequence of characters. It is human-readable.

What are bytes objects in Python?

Strings and Character Data in Python The bytes object is one of the core built-in types for manipulating binary data. A bytes object is an immutable sequence of single byte values. Each element in a bytes object is a small integer in the range of 0 to 255.

How many bytes is string in Python?

Note that every string in Python takes additional 49-80 bytes of memory, where it stores supplementary information, such as hash, length, length in bytes, encoding type and string flags. That's why an empty string takes 49 bytes of memory.

How many bytes is a string?

But what about a string? A string is composed of: An 8-byte object header (4-byte SyncBlock and a 4-byte type descriptor)


1 Answers

In Python 2 and 3, str was used both for strings of characters as well as bytes. In Fact, until Python 2.6, there wasn't even a bytes type (and in 2.6 and 2.7, bytes is str).

The mentioned inconsistencies in the hashlib documentation are an artifact of this history.

like image 161
phihag Avatar answered Oct 05 '22 11:10

phihag