Generate large number of unique random float32 numbers

Tags:

I need to generate a binary file containing only unique random numbers, with single precision. The purpose is then to calculate the entropy of this file and use it with other datasets entropy to calculate a ratio entropy_file/entropy_randUnique. This value is named "randomness".

I can do this in python with double-precision numbers and inserting them into set(), using struct.pack like so:

    numbers = set()
    while len(numbers) < size:
        numbers.add(struct.pack(precision,random.random()))
    for num in numbers:
        file.write(num)

but when I change to single precision I can't just change the pack method (that will produce a lot of the same numbers and the while will never end), and I can't generate single precision numbers with random. I've looked into numpy but the generator works the same way from what I understood. How can I get 370914252 (this is my biggest test case) unique float32 inside a binary file, even if they're not random, I think that a shuffled sequence would suffice..

692

asked Nov 20 '13 16:11

SamGamgee

1 Answers

Your best bet is to generate random 32-bit integers then convert them to floating point. You'll need to reject bit representations of infinity and NAN as you generate the numbers.

You can generate your set from the integer values rather than the floating point ones, then do the conversion on output. Rather than using a set, you could use a bit map to detect which integer values have already been used; that's more likely to fit in memory, especially given the largest sample size you indicate.

def random_unique_floats(n):
    used = bytearray(0 for i in xrange(2**32 // 8))
    count = 0
    while count < n:
        bits = random.getrandbits(32)
        value = struct.unpack('f', struct.pack('I', bits))[0]
        if not math.isinf(value) and not math.isnan(value):
            index = bits // 8
            mask = 0x01 << (bits & 0x07)
            if used[index] & mask == 0:
                yield value
                used[index] |= mask
                count += 1

for num in random_unique_floats(size):
    file.write(struct.pack('f', num))

Note that as your number of samples approaches the number of possible floating-point values, the run time will go up exponentially.

172

answered Sep 21 '22 15:09

Mark Ransom

Related questions
                            
                                python logging with multiple modules does not work
                            
                                Why is there a large insert performance difference between python SqlAlchemy Boolean and Integer Type
                            
                                Reading data from csv into pandas when date and time are in separate columns
                            
                                how to encode/decode a simple string
                            
                                Can Python be used to send a true key down event to Mac
                            
                                SciPy 0.12.0 and Numpy 1.6.1 - numpy.core.multiarray failed to import
                            
                                Flask Principal granular resource on demand
                            
                                is there a method to skip unconvertible rows when casting a pandas series from str to float?
                            
                                creating small arrays in cython takes a humongous amount of time
                            
                                Pytest - custom output of test results
                            
                                Issues with CORS. Flask <-> AngularJS
                            
                                Python Watchdog: Is there a way to pause the observer?
                            
                                How to stop the reactor while several scrapy spiders are running in the same process
                            
                                Conventions for amount of detail to put in Python exception messages?
                            
                                How to generate multiple parse trees for an ambiguous sentence in NLTK?
                            
                                How can I test for negative zero in Python? [duplicate]
                            
                                flask g.user and before_request
                            
                                How do I preserve symlinks when unzipping an archive using Python?
                            
                                How to create a django User using DRF's ModelSerializer
                            
                                Is there a Python equivalent to MATLAB's pearsrnd function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Generate large number of unique random float32 numbers

Tags:

python

floating-point

numpy

floating-point-precision

floating-point-conversion

SamGamgee

People also ask

1 Answers

Mark Ransom

Recent Activity

Donate For Us