When iterating over a bytes
object in Python 3, one gets the individual bytes
as ints
:
>>> [b for b in b'123'] [49, 50, 51]
How to get 1-length bytes
objects instead?
The following is possible, but not very obvious for the reader and most likely performs bad:
>>> [bytes([b]) for b in b'123'] [b'1', b'2', b'3']
Python bytes is a sequence. Therefore, we can use iterate over it using a looping technique.
Python bytes() Function The bytes() function returns a bytes object. It can convert objects into bytes objects, or create empty bytes object of the specified size.
If you are concerned about performance of this code and an int
as a byte is not suitable interface in your case then you should probably reconsider data structures that you use e.g., use str
objects instead.
You could slice the bytes
object to get 1-length bytes
objects:
L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
There is PEP 0467 -- Minor API improvements for binary sequences that proposes bytes.iterbytes()
method:
>>> list(b'123'.iterbytes()) [b'1', b'2', b'3']
int.to_bytes
int
objects have a to_bytes method which can be used to convert an int to its corresponding byte:
>>> import sys >>> [i.to_bytes(1, sys.byteorder) for i in b'123'] [b'1', b'2', b'3']
As with some other other answers, it's not clear that this is more readable than the OP's original solution: the length and byteorder arguments make it noisier I think.
struct.unpack
Another approach would be to use struct.unpack, though this might also be considered difficult to read, unless you are familiar with the struct module:
>>> import struct >>> struct.unpack('3c', b'123') (b'1', b'2', b'3')
(As jfs observes in the comments, the format string for struct.unpack
can be constructed dynamically; in this case we know the number of individual bytes in the result must equal the number of bytes in the original bytestring, so struct.unpack(str(len(bytestring)) + 'c', bytestring)
is possible.)
Performance
>>> import random, timeit >>> bs = bytes(random.randint(0, 255) for i in range(100)) >>> # OP's solution >>> timeit.timeit(setup="from __main__ import bs", stmt="[bytes([b]) for b in bs]") 46.49886950897053 >>> # Accepted answer from jfs >>> timeit.timeit(setup="from __main__ import bs", stmt="[bs[i:i+1] for i in range(len(bs))]") 20.91463226894848 >>> # Leon's answer >>> timeit.timeit(setup="from __main__ import bs", stmt="list(map(bytes, zip(bs)))") 27.476876026019454 >>> # guettli's answer >>> timeit.timeit(setup="from __main__ import iter_bytes, bs", stmt="list(iter_bytes(bs))") 24.107485140906647 >>> # user38's answer (with Leon's suggested fix) >>> timeit.timeit(setup="from __main__ import bs", stmt="[chr(i).encode('latin-1') for i in bs]") 45.937552741961554 >>> # Using int.to_bytes >>> timeit.timeit(setup="from __main__ import bs;from sys import byteorder", stmt="[x.to_bytes(1, byteorder) for x in bs]") 32.197659170022234 >>> # Using struct.unpack, converting the resulting tuple to list >>> # to be fair to other methods >>> timeit.timeit(setup="from __main__ import bs;from struct import unpack", stmt="list(unpack('100c', bs))") 1.902243083808571
struct.unpack
seems to be at least an order of magnitude faster than other methods, presumably because it operates at the byte level. int.to_bytes
, on the other hand, performs worse than most of the "obvious" approaches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With