I'm trying to use python to create a random binary file. This is what I've got already:
f = open(filename,'wb') for i in xrange(size_kb): for ii in xrange(1024/4): f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1))) f.close()
But it's terribly slow (0.82 seconds for size_kb=1024 on my 3.9GHz SSD disk machine). A big bottleneck seems to be the random int generation (replacing the randint() with a 0 reduces running time from 0.82s to 0.14s).
Now I know there are more efficient ways of creating random data files (namely dd if=/dev/urandom) but I'm trying to figure this out for sake of curiosity... is there an obvious way to improve this?
Initialize an empty string, say S. Iterate over the range [0, N – 1] and perform the following steps: Store a random number in the range [0, 1] using rand() function. Append the randomly generated 0 or 1 to the end of the string S.
IMHO - the following is completely redundant:
f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))
There's absolutely no need to use struct.pack
, just do something like:
import os with open('output_file', 'wb') as fout: fout.write(os.urandom(1024)) # replace 1024 with size_kb if not unreasonably large
Then, if you need to re-use the file for reading integers, then struct.unpack
then.
(my use case is generating a file for a unit test so I just need a file that isn't identical with other generated files).
Another option is to just write a UUID4 to the file, but since I don't know the exact use case, I'm not sure that's viable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With