When iterating over a <code>bytes</code> object in Python 3, one gets the individual <code>bytes</code> as <code>ints</code>: <pre class="prettyprint"><code>>>> [b for b in b'123'] [49, 50, 51] </code></pre> How to get 1-length <code>bytes</code> objects instead? The following is possible, but not very obvious for the reader and most likely performs bad: <pre class="prettyprint"><code>>>> [bytes([b]) for b in b'123'] [b'1', b'2', b'3'] </code></pre>

If you are concerned about performance of this code and an <code>int</code> as a byte is not suitable interface in your case then you should probably reconsider data structures that you use e.g., use <code>str</code> objects instead. You could slice the <code>bytes</code> object to get 1-length <code>bytes</code> objects: <pre class="prettyprint"><code>L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))] </code></pre> There is PEP 0467 -- Minor API improvements for binary sequences that proposes <code>bytes.iterbytes()</code> method: <pre class="prettyprint"><code>>>> list(b'123'.iterbytes()) [b'1', b'2', b'3'] </code></pre>

Iterate over individual bytes in Python 3

Tags:

python

python-3.x

When iterating over a bytes object in Python 3, one gets the individual bytes as ints:

>>> [b for b in b'123'] [49, 50, 51]

How to get 1-length bytes objects instead?

The following is possible, but not very obvious for the reader and most likely performs bad:

>>> [bytes([b]) for b in b'123'] [b'1', b'2', b'3']

519

asked Jan 10 '13 21:01

flying sheep

2 Answers

If you are concerned about performance of this code and an int as a byte is not suitable interface in your case then you should probably reconsider data structures that you use e.g., use str objects instead.

You could slice the bytes object to get 1-length bytes objects:

L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]

There is PEP 0467 -- Minor API improvements for binary sequences that proposes bytes.iterbytes() method:

>>> list(b'123'.iterbytes()) [b'1', b'2', b'3']

answered Oct 18 '22 07:10

jfs

int.to_bytes

int objects have a to_bytes method which can be used to convert an int to its corresponding byte:

>>> import sys >>> [i.to_bytes(1, sys.byteorder) for i in b'123'] [b'1', b'2', b'3']

As with some other other answers, it's not clear that this is more readable than the OP's original solution: the length and byteorder arguments make it noisier I think.

struct.unpack

Another approach would be to use struct.unpack, though this might also be considered difficult to read, unless you are familiar with the struct module:

>>> import struct >>> struct.unpack('3c', b'123') (b'1', b'2', b'3')

(As jfs observes in the comments, the format string for struct.unpack can be constructed dynamically; in this case we know the number of individual bytes in the result must equal the number of bytes in the original bytestring, so struct.unpack(str(len(bytestring)) + 'c', bytestring) is possible.)

Performance

>>> import random, timeit >>> bs = bytes(random.randint(0, 255) for i in range(100))  >>> # OP's solution >>> timeit.timeit(setup="from __main__ import bs",                   stmt="[bytes([b]) for b in bs]") 46.49886950897053  >>> # Accepted answer from jfs >>> timeit.timeit(setup="from __main__ import bs",                   stmt="[bs[i:i+1] for i in range(len(bs))]") 20.91463226894848  >>>  # Leon's answer >>> timeit.timeit(setup="from __main__ import bs",                    stmt="list(map(bytes, zip(bs)))") 27.476876026019454  >>> # guettli's answer >>> timeit.timeit(setup="from __main__ import iter_bytes, bs",                           stmt="list(iter_bytes(bs))") 24.107485140906647  >>> # user38's answer (with Leon's suggested fix) >>> timeit.timeit(setup="from __main__ import bs",                    stmt="[chr(i).encode('latin-1') for i in bs]") 45.937552741961554  >>> # Using int.to_bytes >>> timeit.timeit(setup="from __main__ import bs;from sys import byteorder",                    stmt="[x.to_bytes(1, byteorder) for x in bs]") 32.197659170022234  >>> # Using struct.unpack, converting the resulting tuple to list >>> # to be fair to other methods >>> timeit.timeit(setup="from __main__ import bs;from struct import unpack",                    stmt="list(unpack('100c', bs))") 1.902243083808571

struct.unpack seems to be at least an order of magnitude faster than other methods, presumably because it operates at the byte level. int.to_bytes, on the other hand, performs worse than most of the "obvious" approaches.

answered Oct 18 '22 07:10

snakecharmerb

Related questions
                            
                                How to split Vector into columns - using PySpark
                            
                                negative zero in python
                            
                                Using the __call__ method of a metaclass instead of __new__?
                            
                                Pylint showing invalid variable name in output
                            
                                Ruby equivalent of Python's "dir"?
                            
                                How to write bytes to a file in Python 3 without knowing the encoding?
                            
                                Subclassing int in Python
                            
                                High Memory Usage Using Python Multiprocessing
                            
                                How to do Decimal to float conversion in Python?
                            
                                How to automatically destroy django test database
                            
                                How can I use io.StringIO() with the csv module?
                            
                                How to access sparse matrix elements?
                            
                                Python mock call_args_list unpacking tuples for assertion on arguments
                            
                                Scope of variable within "with" statement?
                            
                                Pandas isna() and isnull(), what is the difference?
                            
                                How to group DataFrame by a period of time?
                            
                                Django persistent database connection
                            
                                BeautifulSoup innerhtml?
                            
                                Use Python format string in reverse for parsing
                            
                                How to extend an array in-place in Numpy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With