I need to generate a binary file containing only unique random numbers, with single precision. The purpose is then to calculate the entropy of this file and use it with other datasets entropy to calculate a ratio entropy_file/entropy_randUnique. This value is named "randomness".
I can do this in python with double-precision numbers and inserting them into set()
, using struct.pack
like so:
numbers = set()
while len(numbers) < size:
numbers.add(struct.pack(precision,random.random()))
for num in numbers:
file.write(num)
but when I change to single precision I can't just change the pack method (that will produce a lot of the same numbers and the while will never end), and I can't generate single precision numbers with random
. I've looked into numpy
but the generator works the same way from what I understood.
How can I get 370914252 (this is my biggest test case) unique float32 inside a binary file, even if they're not random, I think that a shuffled sequence would suffice..
Use a random. random() function of a random module to generate a random float number uniformly in the semi-open range [0.0, 1.0) . Note: A random() function can only provide float numbers between 0.1. to 1.0. Us uniform() method to generate a random float number between any two numbers.
In order to generate Random float type numbers in Java, we use the nextFloat() method of the java. util. Random class. This returns the next random float value between 0.0 (inclusive) and 1.0 (exclusive) from the random generator sequence.
Use a random. randint() function to get a random integer number from the inclusive range. For example, random. randint(0, 10) will return a random number from [0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10].
Almost all module functions depend on the basic function random() , which generates a random float uniformly in the semi-open range [0.0, 1.0).
Your best bet is to generate random 32-bit integers then convert them to floating point. You'll need to reject bit representations of infinity and NAN as you generate the numbers.
You can generate your set
from the integer values rather than the floating point ones, then do the conversion on output. Rather than using a set, you could use a bit map to detect which integer values have already been used; that's more likely to fit in memory, especially given the largest sample size you indicate.
def random_unique_floats(n):
used = bytearray(0 for i in xrange(2**32 // 8))
count = 0
while count < n:
bits = random.getrandbits(32)
value = struct.unpack('f', struct.pack('I', bits))[0]
if not math.isinf(value) and not math.isnan(value):
index = bits // 8
mask = 0x01 << (bits & 0x07)
if used[index] & mask == 0:
yield value
used[index] |= mask
count += 1
for num in random_unique_floats(size):
file.write(struct.pack('f', num))
Note that as your number of samples approaches the number of possible floating-point values, the run time will go up exponentially.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With