Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does base64.b64encode() return a bytes object?

The purpose of base64.b64encode() is to convert binary data into ASCII-safe "text". However, the method returns an object of type bytes:

>>> import base64 >>> base64.b64encode(b'abc') b'YWJj' 

It's easy to simply take that output and decode() it, but my question is: what is a significance of base64.b64encode() returning bytes rather than a str?

like image 788
gardarh Avatar asked Mar 13 '17 21:03

gardarh


People also ask

What does Base64 b64encode return?

b64encode(): Encodes the bytes-like object using Base64 and return the encoded bytes. base64. b64decode(): Decode the Base64 encoded bytes-like object or ASCII string s and return the decoded bytes.

What does Base64 b64encode do?

b64encode() in Python. With the help of base64. b64encode() method, we can encode the string into the binary form. Return : Return the encoded string.

Why does Base64 increase size?

Each Base64 digit represents exactly 6 bits of data. So, three 8-bits bytes of the input string/binary file (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits). This means that the Base64 version of a string or file will be at least 133% the size of its source (a ~33% increase).

Why does Base64 strings end with ==?

Q Why does an = get appended at the end? A: As a short answer: The last character ( = sign) is added only as a complement (padding) in the final process of encoding a message with a special number of characters.


2 Answers

The purpose of the base64.b64encode() function is to convert binary data into ASCII-safe "text"

Python disagrees with that - base64 has been intentionally classified as a binary transform.

It was a design decision in Python 3 to force the separation of bytes and text and prohibit implicit transformations. Python is now so strict about this that bytes.encode doesn't even exist, and so b'abc'.encode('base64') would raise an AttributeError.

The opinion the language takes is that a bytestring object is already encoded. A codec which encodes bytes into text does not fit into this paradigm, because when you want to go from the bytes domain to the text domain it's a decode. Note that rot13 encoding was also banished from the list of standard encodings for the same reason - it didn't fit properly into the Python 3 paradigm.

There also can be a performance argument to make: suppose Python automatically handled decoding of the base64 output, which is an ASCII-encoded binary representation produced by C code from the binascii module, into a Python object in the text domain. If you actually wanted the bytes, you would just have to undo the decoding by encoding into ASCII again. It would be a wasteful round-trip, an unnecessary double-negation. Better to 'opt-in' for the decode-to-text step.

like image 96
wim Avatar answered Oct 13 '22 01:10

wim


It's impossible for b64encode() to know what you want to do with its output.

While in many cases you may want to treat the encoded value as text, in many others – for example, sending it over a network – you may instead want to treat it as bytes.

Since b64encode() can't know, it refuses to guess. And since the input is bytes, the output remains the same type, rather than being implicitly coerced to str.

As you point out, decoding the output to str is straightforward:

base64.b64encode(b'abc').decode('ascii') 

... as well as being explicit about the result.

As an aside, it's worth noting that although base64.b64decode() (note: decode, not encode) has accepted str since version 3.3, the change was somewhat controversial.

like image 39
Zero Piraeus Avatar answered Oct 13 '22 01:10

Zero Piraeus