I need to have huge boolean array. All values should be initialized as "True": <pre class="prettyprint"><code>arr = [True] * (10 ** 9) </code></pre> But created as above it takes too much memory. So I decided to use <code>bytearray</code> for that: <pre class="prettyprint"><code>arr = bytearray(10 ** 9) # initialized with zeroes </code></pre> Is it possible to initialize <code>bytearray</code> with <code>b'\x01'</code> as effectively as it is initialized by <code>b'\x00'</code>? I understand I could initialize <code>bytearray</code> with zeros and inverse my logic. But I'd prefer not to do that if possible. timeit: <pre class="prettyprint"><code>>>> from timeit import timeit >>> def f1(): ... return bytearray(10**9) ... >>> def f2(): ... return bytearray(b'\x01'*(10**9)) ... >>> timeit(f1, number=100) 14.117428014000325 >>> timeit(f2, number=100) 51.42543800899875 </code></pre>

Easy, use sequence multiplication: <pre class="prettyprint"><code>arr = bytearray(b'\x01') * 10 ** 9 </code></pre> Same approach works for initializing with zeroes (<code>bytearray(b'\x00') * 10 ** 9</code>), and it's generally preferred, since passing integers to the <code>bytes</code> constructor has been a source of confusion before (people sometimes think they can make a single element <code>bytes</code> with the value of the integer). You want to initialize the single element <code>bytearray</code> first, then multiply, rather than multiplying the <code>bytes</code> and passing it to the <code>bytearray</code> constructor, so you avoid doubling your peak memory requirements (and requiring reading from one huge array and writing to another, on top of the simple <code>memset</code>-like operation on a single array that any solution requires). In my local tests, <code>bytearray(b'\x01') * 10 ** 9</code> runs exactly as fast as <code>bytearray(10 ** 9)</code>; both took ~164 ms per loop, vs. 434 ms for multiplying the <code>bytes</code> object, then passing it to <code>bytearray</code> constructor.

Consider using NumPy for this sort of thing. On my computer, <code>np.ones</code> (which initializes an array of all-1 values) with boolean "dtype" is just as fast as the bare <code>bytearray</code> constructor: <pre class="prettyprint"><code>>>> import numpy as np >>> from timeit import timeit >>> def f1(): return bytearray(10**9) >>> def f2(): return np.ones(10**9, dtype=np.bool) >>> timeit(f1, number=100) 24.9679438900057 >>> timeit(f2, number=100) 24.732190757000353 </code></pre> If you don't want to use third-party modules, another option with competitive performance is to create a one-element <code>bytearray</code> and then expand that, instead of creating a large byte-string and converting it to a bytearray. <pre class="prettyprint"><code>>>> def f3(): return bytearray(b'\x01')*(10**9) >>> timeit(f3, number=100) 24.842667759003234 </code></pre> Since my computer appears to be slower than yours, here is the performance of your original option for comparison: <pre class="prettyprint"><code>>>> def fX(): return bytearray(b'\x01'*(10**9)) >>> timeit(fX, number=100) 56.61828187300125 </code></pre> Cost in all cases is going to be dominated by allocating a decimal gigabyte of RAM and writing to every byte of it. <code>fX</code> is roughly twice as slow as the other three functions because it has to do this twice. A good rule of thumb for you to remember when working with code like this is: minimize the number of allocations. It may be worth dropping down to a lower-level language in which you can explicitly control allocation (if you don't know any such language already, I recommend Rust).

Is it possible to effectively initialize bytearray with non-zero value?

I need to have huge boolean array. All values should be initialized as "True":

arr = [True] * (10 ** 9)

But created as above it takes too much memory. So I decided to use bytearray for that:

arr = bytearray(10 ** 9)  # initialized with zeroes

Is it possible to initialize bytearray with b'\x01' as effectively as it is initialized by b'\x00'?

I understand I could initialize bytearray with zeros and inverse my logic. But I'd prefer not to do that if possible.

timeit:

>>> from timeit import timeit
>>> def f1():
...   return bytearray(10**9)
... 
>>> def f2():
...   return bytearray(b'\x01'*(10**9))
... 
>>> timeit(f1, number=100)
14.117428014000325
>>> timeit(f2, number=100)
51.42543800899875

What is the difference between bytes and Bytearray?

The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.

How do you initialize bytes?

Answers. The issue with arrays is that you have to know the size of the array in order to initialize it. Once you know the size of the array (it's length), then initializing it is as simple as this: byte[] fileStream = new byte[length];

What is the use of Bytearray?

Python | bytearray() function bytearray() method returns a bytearray object which is an array of given bytes. It gives a mutable sequence of integers in the range 0 <= x < 256. Returns: Returns an array of bytes of the given size. source parameter can be used to initialize the array in few different ways.

Easy, use sequence multiplication:

arr = bytearray(b'\x01') * 10 ** 9

Same approach works for initializing with zeroes (bytearray(b'\x00') * 10 ** 9), and it's generally preferred, since passing integers to the bytes constructor has been a source of confusion before (people sometimes think they can make a single element bytes with the value of the integer).

You want to initialize the single element bytearray first, then multiply, rather than multiplying the bytes and passing it to the bytearray constructor, so you avoid doubling your peak memory requirements (and requiring reading from one huge array and writing to another, on top of the simple memset-like operation on a single array that any solution requires).

In my local tests, bytearray(b'\x01') * 10 ** 9 runs exactly as fast as bytearray(10 ** 9); both took ~164 ms per loop, vs. 434 ms for multiplying the bytes object, then passing it to bytearray constructor.

Consider using NumPy for this sort of thing. On my computer, np.ones (which initializes an array of all-1 values) with boolean "dtype" is just as fast as the bare bytearray constructor:

>>> import numpy as np
>>> from timeit import timeit
>>> def f1(): return bytearray(10**9)
>>> def f2(): return np.ones(10**9, dtype=np.bool)
>>> timeit(f1, number=100)
24.9679438900057
>>> timeit(f2, number=100)
24.732190757000353

If you don't want to use third-party modules, another option with competitive performance is to create a one-element bytearray and then expand that, instead of creating a large byte-string and converting it to a bytearray.

>>> def f3(): return bytearray(b'\x01')*(10**9)
>>> timeit(f3, number=100)
24.842667759003234

Since my computer appears to be slower than yours, here is the performance of your original option for comparison:

>>> def fX(): return bytearray(b'\x01'*(10**9))
>>> timeit(fX, number=100)
56.61828187300125

Cost in all cases is going to be dominated by allocating a decimal gigabyte of RAM and writing to every byte of it. fX is roughly twice as slow as the other three functions because it has to do this twice. A good rule of thumb for you to remember when working with code like this is: minimize the number of allocations. It may be worth dropping down to a lower-level language in which you can explicitly control allocation (if you don't know any such language already, I recommend Rust).

Is it possible to effectively initialize bytearray with non-zero value?

Tags:

python

python-3.x

python-bytearray

Mikhail M.

People also ask

2 Answers

ShadowRanger

zwol

Recent Activity

Donate For Us

Is it possible to effectively initialize bytearray with non-zero value?

Tags:

python

python-3.x

python-bytearray

Mikhail M.

People also ask

2 Answers

ShadowRanger

zwol

Related questions

Recent Activity

Donate For Us