Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over individual bytes in Python 3

When iterating over a bytes object in Python 3, one gets the individual bytes as ints:

>>> [b for b in b'123'] [49, 50, 51] 

How to get 1-length bytes objects instead?

The following is possible, but not very obvious for the reader and most likely performs bad:

>>> [bytes([b]) for b in b'123'] [b'1', b'2', b'3'] 
like image 519
flying sheep Avatar asked Jan 10 '13 21:01

flying sheep


People also ask

Are Python bytes iterable?

Python bytes is a sequence. Therefore, we can use iterate over it using a looping technique.

What does bytes () do in Python?

Python bytes() Function The bytes() function returns a bytes object. It can convert objects into bytes objects, or create empty bytes object of the specified size.


2 Answers

If you are concerned about performance of this code and an int as a byte is not suitable interface in your case then you should probably reconsider data structures that you use e.g., use str objects instead.

You could slice the bytes object to get 1-length bytes objects:

L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))] 

There is PEP 0467 -- Minor API improvements for binary sequences that proposes bytes.iterbytes() method:

>>> list(b'123'.iterbytes()) [b'1', b'2', b'3'] 
like image 67
jfs Avatar answered Oct 18 '22 07:10

jfs


int.to_bytes

int objects have a to_bytes method which can be used to convert an int to its corresponding byte:

>>> import sys >>> [i.to_bytes(1, sys.byteorder) for i in b'123'] [b'1', b'2', b'3'] 

As with some other other answers, it's not clear that this is more readable than the OP's original solution: the length and byteorder arguments make it noisier I think.

struct.unpack

Another approach would be to use struct.unpack, though this might also be considered difficult to read, unless you are familiar with the struct module:

>>> import struct >>> struct.unpack('3c', b'123') (b'1', b'2', b'3') 

(As jfs observes in the comments, the format string for struct.unpack can be constructed dynamically; in this case we know the number of individual bytes in the result must equal the number of bytes in the original bytestring, so struct.unpack(str(len(bytestring)) + 'c', bytestring) is possible.)

Performance

>>> import random, timeit >>> bs = bytes(random.randint(0, 255) for i in range(100))  >>> # OP's solution >>> timeit.timeit(setup="from __main__ import bs",                   stmt="[bytes([b]) for b in bs]") 46.49886950897053  >>> # Accepted answer from jfs >>> timeit.timeit(setup="from __main__ import bs",                   stmt="[bs[i:i+1] for i in range(len(bs))]") 20.91463226894848  >>>  # Leon's answer >>> timeit.timeit(setup="from __main__ import bs",                    stmt="list(map(bytes, zip(bs)))") 27.476876026019454  >>> # guettli's answer >>> timeit.timeit(setup="from __main__ import iter_bytes, bs",                           stmt="list(iter_bytes(bs))") 24.107485140906647  >>> # user38's answer (with Leon's suggested fix) >>> timeit.timeit(setup="from __main__ import bs",                    stmt="[chr(i).encode('latin-1') for i in bs]") 45.937552741961554  >>> # Using int.to_bytes >>> timeit.timeit(setup="from __main__ import bs;from sys import byteorder",                    stmt="[x.to_bytes(1, byteorder) for x in bs]") 32.197659170022234  >>> # Using struct.unpack, converting the resulting tuple to list >>> # to be fair to other methods >>> timeit.timeit(setup="from __main__ import bs;from struct import unpack",                    stmt="list(unpack('100c', bs))") 1.902243083808571 

struct.unpack seems to be at least an order of magnitude faster than other methods, presumably because it operates at the byte level. int.to_bytes, on the other hand, performs worse than most of the "obvious" approaches.

like image 25
snakecharmerb Avatar answered Oct 18 '22 07:10

snakecharmerb