Despite the many related questions, I can't find any that match my problem. I'd like to change a binary string (for example, "0110100001101001"
) into a byte array (same example, b"hi"
).
I tried this:
bytes([int(i) for i in "0110100001101001"])
but I got:
b'\x00\x01\x01\x00\x01' #... and so on
What's the correct way to do this in Python 3?
Byte arrays mostly contain binary data such as an image. If the byte array that you are trying to convert to String contains binary data, then none of the text encodings (UTF_8 etc.) will work.
We use int() and hex() to convert a binary number to its equivalent hexadecimal number.
Here's an example of doing it the first way that Patrick mentioned: convert the bitstring to an int and take 8 bits at a time. The natural way to do that generates the bytes in reverse order. To get the bytes back into the proper order I use extended slice notation on the bytearray with a step of -1: b[::-1]
.
def bitstring_to_bytes(s):
v = int(s, 2)
b = bytearray()
while v:
b.append(v & 0xff)
v >>= 8
return bytes(b[::-1])
s = "0110100001101001"
print(bitstring_to_bytes(s))
Clearly, Patrick's second way is more compact. :)
However, there's a better way to do this in Python 3: use the int.to_bytes method:
def bitstring_to_bytes(s):
return int(s, 2).to_bytes((len(s) + 7) // 8, byteorder='big')
If len(s)
is guaranteed to be a multiple of 8, then the first arg of .to_bytes
can be simplified:
return int(s, 2).to_bytes(len(s) // 8, byteorder='big')
This will raise OverflowError
if len(s)
is not a multiple of 8, which may be desirable in some circumstances.
Another option is to use double negation to perform ceiling division. For integers a & b, floor division using //
n = a // b
gives the integer n such that
n <= a/b < n + 1
Eg,47 // 10
gives 4, and
-47 // 10
gives -5. So
-(-47 // 10)
gives 5, effectively performing ceiling division.
Thus in bitstring_to_bytes
we could do:
return int(s, 2).to_bytes(-(-len(s) // 8), byteorder='big')
However, not many people are familiar with this efficient & compact idiom, so it's generally considered to be less readable than
return (s, 2).to_bytes((len(s) + 7) // 8, byteorder='big')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With