Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Random list of numbers in a range keeping with a minimum distance

Let's say this code random.seed(42) random.sample(range(0,40), 4) Output:[7, 1, 17, 15] What should I change in this code to generate random numbers where the minimum distance between any two numbers in the list will be be at least 10 or more. Something like [0, 10, 25, 39] or [0, 12, 23, 38 ]. Possible duplicate would be this. Thanks.

like image 788
Humaun Rashid Nayan Avatar asked Aug 19 '18 14:08

Humaun Rashid Nayan


1 Answers

One-line solution for the sorted case

Here's a simple one-liner, that generates all possibilities with equal likelihood:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

A few sample outputs:

[2, 16, 26, 38]
[0, 10, 25, 35]
[2, 12, 25, 36]
[0, 13, 26, 39]
[1, 14, 24, 34]
[1, 11, 29, 39]
[0, 13, 26, 39]
[1, 12, 27, 38]

Outputs are always generated in sorted order; if that's not what you want, you can easily add a shuffle to the result (or see below for a general solution).

Explanation: if [a, b, c, d] is an ordered list satisfying your requirements, then [a, b-9, c-18, d-27] is an ordered sample of length 4 from range(13), and vice versa. So all you need to do is generate samples from range(13), sort them, and then re-add the necessary multiples of 9 to get values that are at least 10 apart.

General unsorted solution

Here's a general solution that doesn't require sorting of the random sample. Instead, we compute the ranks of the elements of the sample and use those to compute the necessary offsets.

import random

def ranks(sample):
    """
    Return the ranks of each element in an integer sample.
    """
    indices = sorted(range(len(sample)), key=lambda i: sample[i])
    return sorted(indices, key=lambda i: indices[i])

def sample_with_minimum_distance(n=40, k=4, d=10):
    """
    Sample of k elements from range(n), with a minimum distance d.
    """
    sample = random.sample(range(n-(k-1)*(d-1)), k)
    return [s + (d-1)*r for s, r in zip(sample, ranks(sample))]

And some sample outputs:

>>> sample_with_minimum_distance()
[17, 27, 3, 38]
>>> sample_with_minimum_distance()
[27, 38, 10, 0]
>>> sample_with_minimum_distance()
[36, 13, 1, 24]
>>> sample_with_minimum_distance()
[1, 25, 15, 39]
>>> sample_with_minimum_distance()
[26, 12, 1, 38]

The "cheap trick" solution

If the various constants here in the original problem are fixed (population range(40), samples of length 4, minimum distance of 10), then there's an obvious cheap trick: there are only 715 possible different sorted samples, so just pre-create a list containing all of them, and then every time you need to generate a sample, pick one from that pre-created list using random.choice.

For the generation, we can either go with a grossly inefficient but clearly correct brute force solution:

>>> import itertools
>>> all_samples = [  # inefficient brute-force solution
...     sample for sample in itertools.product(range(40), repeat=4)
...     if all(x - y >= 10 for x, y in zip(sample[1:], sample))
... ]
>>> len(all_samples)
715

This is still plenty fast enough, taking only a couple of seconds on my machine. Alternatively, we can do something more refined and direct using the same bijection as identified above.

>>> all_samples = [
...     [9*i + s for i, s in enumerate(sample)]
...     for sample in itertools.combinations(range(13), 4)
... ]
>>> len(all_samples)
715

Either way, we generate the list of samples just once, and then use random.choice to pick one every time we need it:

>>> random.choice(all_samples)
(1, 11, 21, 38)
>>> random.choice(all_samples)
(0, 10, 23, 33)

Of course, this solution doesn't scale well: for 7 samples out of range(100) with a minimum distance of 5, there are over 2 billion possible different sorted samples.

Demonstration of uniformity

I claimed earlier that the one-liner produces all possibilities with equal likelihood (assuming a perfect source of random numbers, of course, but Python's Mersenne Twister is good enough that we're unlikely to detect statistical anomalies arising from the core generator in the test below). Here's a demonstration of that uniformity.

First, for convenience, we'll wrap our one-liner in a function. We'll also change it to return a tuple instead of a list, because for the next step we want something hashable.

>>> def sorted_sample():
...     return tuple(9*i + x for i, x in
...                  enumerate(sorted(random.sample(range(13), 4))))

Now we generate 10 million samples (this will take a couple of minutes), and count how often each one occurs:

>>> from collections import Counter
>>> samples = Counter(sorted_sample() for _ in range(10**7))

A couple of quick checks:

>>> len(samples)
715
>>> 10**7 / 715
13986.013986013986
>>> samples[0, 10, 20, 30]
14329
>>> samples[0, 11, 22, 33]
13995
>>> min(samples.values())
13624
>>> max(samples.values())
14329

We've collected 715 different combinations, and a little bit of maths tells us that that's exactly the number we expect (13 choose 4), so with a uniform distribution we'd expect each combination to occur approximately 10**7 / 715 times, or somewhere around 14000 times. Both the combinations we checked above are around 14000, as are the minimum and maximum counts appearing, but not surprisingly, there's some random variation.

Is that random variation within acceptable limits? To find out, we can do a chi-squared test with p = 0.01. Our null hypothesis is that the population we're drawing from is uniform: i.e., that our code generates each possible sample with equal likelihood.

SciPy makes a chi-squared test for uniformity easy:

>>> from scipy.stats import chisquare
>>> chisquare(list(samples.values()))
Power_divergenceResult(statistic=724.682234, pvalue=0.3825060783237031)

The p-value we get is not smaller than 0.01, so we fail to reject the null hypothesis: that is, we have no evidence of non-uniformity.

like image 190
Mark Dickinson Avatar answered Sep 30 '22 06:09

Mark Dickinson