Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: bytearray vs array

What is the difference between array.array('B') and bytearray?

from array import array

a = array('B', 'abc')
b = bytearray('abc')

a[0] = 100
b[0] = 'd'

print a
print b

Are there any memory or speed differences? What is the preferred use case of each one?

like image 451
Ecir Hana Avatar asked Aug 09 '12 12:08

Ecir Hana


People also ask

Why we use Bytearray in Python?

bytearray() method returns a bytearray object which is an array of given bytes. It gives a mutable sequence of integers in the range 0 <= x < 256. Returns: Returns an array of bytes of the given size. source parameter can be used to initialize the array in few different ways.

Is Bytearray same as bytes Python?

The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.

What is Bytearray datatype in Python?

The bytearray type is a mutable sequence of integers in the range between 0 and 255. It allows you to work directly with binary data. It can be used to work with low-level data such as that inside of images or arriving directly from the network. Bytearray type inherits methods from both list and str types.

What does Bytearray mean?

A byte array is simply an area of memory containing a group of contiguous (side by side) bytes, such that it makes sense to talk about them in order: the first byte, the second byte etc..


4 Answers

bytearray is the successor of Python 2.x's string type. It's basically the built-in byte array type. Unlike the original string type, it's mutable.

The array module, on the other hand, was created to create binary data structures to communicate with the outside world (for example, to read/write binary file formats).

Unlike bytearray, it supports all kinds of array elements. It's flexible.

So if you just need an array of bytes, bytearray should work fine. If you need flexible formats (say when the element type of the array needs to be determined at runtime), array.array is your friend.

Without looking at the code, my guess would be that bytearray is probably faster since it doesn't have to consider different element types. But it's possible that array('B') returns a bytearray.

like image 105
Aaron Digulla Avatar answered Sep 24 '22 19:09

Aaron Digulla


bytearray has all the usual str methods. You can thing of it as a mutable str (bytes in Python3)

While array.array is geared to reading and writing files. 'B' is just a special case for array.array

You can see there is quite a difference looking at the dir() of each

>>> dir(bytearray) ['__add__', '__alloc__', '__class__', '__contains__', '__delattr__',  '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',  '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',  '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',  '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__',  '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append',  'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'extend',  'find', 'fromhex', 'index', 'insert', 'isalnum', 'isalpha', 'isdigit', 'islower',  'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans',  'partition', 'pop', 'remove', 'replace', 'reverse', 'rfind', 'rindex', 'rjust',  'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',  'swapcase', 'title', 'translate', 'upper', 'zfill'] >>> dir(array) ['__add__', '__class__', '__contains__', '__copy__', '__deepcopy__',  '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__',  '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__',   '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__',  '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__',  '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append',  'buffer_info', 'byteswap', 'count', 'extend', 'frombytes', 'fromfile',  'fromlist', 'fromstring', 'fromunicode', 'index', 'insert', 'itemsize', 'pop',  'remove', 'reverse', 'tobytes', 'tofile', 'tolist', 'tostring', 'tounicode',  'typecode'] 
like image 25
John La Rooy Avatar answered Sep 21 '22 19:09

John La Rooy


Python Patterns - An Optimization Anecdote is a good read which points to array.array('B') as being fast. Using the timing() function from that essay does show that array.array('B') is faster than bytearray():

#!/usr/bin/env python

from array import array
from struct import pack
from timeit import timeit
from time import clock

def timing(f, n, a):
    start = clock()
    for i in range(n):
        f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a)
    finish = clock()
    return '%s\t%f' % (f.__name__, finish - start)

def time_array(addr):
    return array('B', addr)

def time_bytearray(addr):
    return bytearray(addr)

def array_tostring(addr):
    return array('B', addr).tostring()

def str_bytearray(addr):
    return str(bytearray(addr))

def struct_pack(addr):
    return pack('4B', *addr)

if __name__ == '__main__':
    count = 10000
    addr = '192.168.4.2'
    addr = tuple([int(i) for i in addr.split('.')])
    print('\t\ttiming\t\tfunc\t\tno func')
    print('%s\t%s\t%s' % (timing(time_array, count, addr),
          timeit('time_array((192,168,4,2))', number=count, setup='from __main__ import time_array'),
          timeit("array('B', (192,168,4,2))", number=count, setup='from array import array')))
    print('%s\t%s\t%s' % (timing(time_bytearray, count, addr),
          timeit('time_bytearray((192,168,4,2))', number=count, setup='from __main__ import time_bytearray'),
          timeit('bytearray((192,168,4,2))', number=count)))
    print('%s\t%s\t%s' % (timing(array_tostring, count, addr),
          timeit('array_tostring((192,168,4,2))', number=count, setup='from __main__ import array_tostring'),
          timeit("array('B', (192,168,4,2)).tostring()", number=count, setup='from array import array')))
    print('%s\t%s\t%s' % (timing(str_bytearray, count, addr),
          timeit('str_bytearray((192,168,4,2))', number=count, setup='from __main__ import str_bytearray'),
          timeit('str(bytearray((192,168,4,2)))', number=count)))
    print('%s\t%s\t%s' % (timing(struct_pack, count, addr),
          timeit('struct_pack((192,168,4,2))', number=count, setup='from __main__ import struct_pack'),
          timeit("pack('4B', *(192,168,4,2))", number=count, setup='from struct import pack')))

The timeit measure actually shows array.array('B') is sometimes more than double the speed of bytearray()

I was interested specifically in the fastest way to pack an IP address into a four byte string for sorting. Looks like neither str(bytearray(addr)) nor array('B', addr).tostring() come close to the speed of pack('4B', *addr).

like image 23
yds Avatar answered Sep 22 '22 19:09

yds


From my test, both used amostly same size of memory but the speed of bytearry is 1.5 times of array when I create a large buffer to read and write.

from array import array
from time import time

s = time()

"""
map = array('B')
for i in xrange(256**4/8):
        map.append(0)
"""

#bytearray
map = bytearray()
for i in xrange(256**4/8):
        map.append(0)
print "init:", time() - s
like image 24
salmon Avatar answered Sep 24 '22 19:09

salmon