I need to create a large bytearry of a specific size but the size is not known prior to run time. The bytes need to be fairly random. The bytearray size may be as small as a few KBs but as large as a several MB. I do not want to iterate byte-by-byte. This is too slow -- I need performance similar to numpy.random. However, I do not have the numpy module available for this project. Is there something part of a standard python install that will do this? Or do i need to compile my own using C?
for those asking for timings:
>>> timeit.timeit('[random.randint(0,128) for i in xrange(1,100000)]',setup='import random', number=100) 35.73110193696641 >>> timeit.timeit('numpy.random.random_integers(0,128,100000)',setup='import numpy', number=100) 0.5785652013481126 >>>
urandom() method is used to generate a string of size random bytes suitable for cryptographic use or we can say this method generates a string containing random characters. Return Value: This method returns a string which represents random bytes suitable for cryptographic use.
The bytearray type is a mutable sequence of integers in the range between 0 and 255. It allows you to work directly with binary data. It can be used to work with low-level data such as that inside of images or arriving directly from the network. Bytearray type inherits methods from both list and str types.
In order to generate random bytes in Java, we use the nextBytes() method. The java. util. Random.
The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.
The os module provides urandom
, even on Windows:
bytearray(os.urandom(1000000))
This seems to perform as quickly as you need, in fact, I get better timings than your numpy (though our machines could be wildly different):
timeit.timeit(lambda:bytearray(os.urandom(1000000)), number=10) 0.0554857286941
There are several possibilities, some faster than os.urandom
. Also consider whether the data has to be generated deterministically from a random seed. This is invaluable for unit tests where failures have to be reproducible.
short and pithy:
lambda n:bytearray(map(random.getrandbits,(8,)*n))
I've use the above for unit tests and it was fast enough but can it be done faster?
using itertools:
lambda n:bytearray(itertools.imap(random.getrandbits,itertools.repeat(8,n))))
itertools and struct producing 8 bytes per iteration
lambda n:(b''.join(map(struct.Struct("!Q").pack,itertools.imap( random.getrandbits,itertools.repeat(64,(n+7)//8)))))[:n]
Anything based on b''.join
will fill 3-7x the memory consumed by the final bytearray with temporary objects since it queues up all the sub-strings before joining them together and python objects have lots of storage overhead.
Producing large chunks with a specialized function gives better performance and avoids filling memory.
import random,itertools,struct,operator def randbytes(n,_struct8k=struct.Struct("!1000Q").pack_into): if n<8000: longs=(n+7)//8 return struct.pack("!%iQ"%longs,*map( random.getrandbits,itertools.repeat(64,longs)))[:n] data=bytearray(n); for offset in xrange(0,n-7999,8000): _struct8k(data,offset, *map(random.getrandbits,itertools.repeat(64,1000))) offset+=8000 data[offset:]=randbytes(n-offset) return data
randint
: bytearray(getrandbits(8) for _ in xrange(n))
: (solution by other poster)bytearray(map(getrandbits,(8,)*n))
itertools
and getrandbits
os.urandom
itertools
and struct
Note:all tests used 10KB as the string size. Results were consistent up till intermediate results filled memory.
Note:os.urandom
is meant to provide secure random seeds. Applications expand that seed with their own fast PRNG. Here's an example, using AES in counter mode as a PRNG:
import os seed=os.urandom(32) from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes from cryptography.hazmat.backends import default_backend backend = default_backend() cipher = Cipher(algorithms.AES(seed), modes.CTR(b'\0'*16), backend=backend) encryptor = cipher.encryptor() nulls=b'\0'*(10**5) #100k from timeit import timeit t=timeit(lambda:encryptor.update(nulls),number=10**5) #1GB, (100K*10k) print("%.1f MB/s"%(1000/t))
This produces pseudorandom data at 180 MB/s. (no hardware AES acceleration, single core) That's only ~5x the speed of the pure python code above.
There's a pure python crypto library waiting to be written. Putting the above techniques together with hashlib
and stream cipher techniques looks promising. Here's a teaser, a fast string xor (42MB/s).
def xor(a,b): s="!%iQ%iB"%divmod(len(a),8) return struct.pack(s,*itertools.imap(operator.xor, struct.unpack(s,a), struct.unpack(s,b)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With