How can we efficiently generate k
random and non-consecutive samples out of [1,...,N]
?
Non-desired Example with (N=10, k=4)
:
2,3,8,10
This is not a desired example since 2
and 3
are consecutive.
Desired Example with (N=10, k=4)
:
2,6,8,10
This is a good example since the difference between every pair of samples is greater than 1
sort(randperm(N-(k-1),k))+[0:(k-1)]
There is a simple obersavation behind this solution, if you take any sorted solution to your problem and substract [0:(k-1)]
, you end up with a random choice of k numbers out of N-(k-1)
Let S denote the set of all k
-element vectors with values taken from [1,...,N]
that don't have any consecutive values. To randomly sample with a uniform distribution over S, you can use a rejection method:
In Matlab, it's easy to generate uniformly distributed k
-element vectors with values taken from [1,...,N]
without replacement (function randsample
). So this is used as the sample space T:
k = 4;
N = 10;
result = [1 1]; % // just to get while loop started
while any(diff(result)<=1) % // does not meet condition: try again
result = sort(randsample(N, k).'); %'// random sample without replacement
end
A Python class which correctly checks every pair of samples. You're responsible for not passing it a set of numbers that is impossible, though (like N = 10, k = 100).
>>> class NonConsecutiveSampler(object):
def __init__(self,N):
import random
self.num = N
def get_samples(self,k):
possibilities = [i for i in range(1,self.num + 1)]
samples = []
while len(samples) < k:
r = random.sample(possibilities,1)[0]
samples.append(r)
for i in range(r - 1, r + 2):
if i in possibilities:
possibilities.remove(i)
samples.sort()
return samples
>>> n = NonConsecutiveSampler(10)
>>> n.get_samples(4)
[2, 5, 8, 10]
>>> n.get_samples(4)
[1, 5, 7, 10]
>>> n.get_samples(4)
[3, 6, 8, 10]
>>> n.get_samples(4)
[1, 3, 5, 8]
EDIT: Made it much more efficient
Sometimes it's faster and easier to generate more samples than you need and then throw away the undesireable values.
One (slow) example.
vec= randi(100,1,1);
for j = 2:50,
while(abs(vec(j)-vec(j-1)<2) vec(j)= randi(100,1,1);end;
end
Another way. Suppose you want 50 samples
vec = rand(100,100,1);
badindex = find(abs(vec(1:99)-vec(2:100) < 1));
vec(badindex) = vec(badindex-1)+vec(badindex+1);
% if you don't want big values,
vec(vec>100) = vec (vec>100) -100; % to ensure, I hope, that neighbors
% are nonconsecutive
(This would be easier in R
) .
You can make the increment between samples evenly distributed between 2 and N-1 (to avoid consecutive and repeated numbers):
N=10;
k=4;
increments = floor(rand(1,k)*(N-2))+2 %// increments allowed are from 2 to N-1 inclusive
out = mod(cumsum(increments), N)+1 %// sum increments
Same in python:
from numpy import cumsum, floor, mod, random
N=5
k=100
increments = floor(random.rand(1,k)*(N-2))+2
out = mod(cumsum(increments), N)+1
print(out)
[ 5. 3. 1. 5. 2. 4. 3. 2. 4. 2. 4. 3. 1. 5. 4. 3. 5. 4.
2. 5. 4. 2. 5. 2. 4. 1. 5. 4. 1. 5. 3. 1. 3. 2. 4. 1.
5. 4. 1. 3. 5. 4. 3. 5. 2. 1. 3. 2. 4. 3. 1. 4. 2. 1.
3. 2. 1. 4. 3. 2. 1. 3. 5. 3. 5. 4. 2. 4. 2. 1. 3. 2.
1. 3. 5. 2. 5. 4. 3. 1. 4. 1. 4. 3. 5. 4. 2. 1. 5. 2.
1. 5. 4. 2. 4. 3. 5. 2. 4. 1.]
Over 100 iterations, even if I limit the number to 1..5, there is no repeated/consecutive number.
A solution in MATLAB (perhaps inelegant) could be something like this:
N = 10;
k = 4;
out = zeros(1,k);
vec = 1 : N;
for idx = 1 : k
ind = randi(numel(vec), 1);
left = max(ind-1, 1); right = min(numel(vec), ind+1);
out(idx) = vec(ind);
to_remove = ind;
if vec(left) == vec(ind)-1
to_remove = [to_remove left];
end
if vec(right) == vec(ind)+1
to_remove = [to_remove right];
end
vec(to_remove) = [];
end
We first declare N
and k
, then declare an output array of zeroes that is k
long. We then generate a sampling vector vec
that goes from 1 up to as N
initially. Next, for each value we want to place into the output, we generate a random position to sample from the vector, then take a look at the position from the left and from the right... ensuring that we are within the boundaries of the array. Also, we only remove to the left or right if the value at the left of the index to remove and also the right are equal to each other (thanks beaker!)
We use this location and sample from this vector, place the value at this location to the output, then remove the indices in this vector that are to the left, to the right, and the actual index itself from this vector. This removes the possibility of sampling from those values again. We repeat this until we run out of values to place in the output.
Here are a few of trial runs:
>> out
out =
9 7 1 5
>> out
out =
7 1 4 10
>> out
out =
10 8 1 6
>> out
out =
10 4 8 1
A not-particularly-elegant python solution:
def nonseq(n, k):
out = [random.randint(0, n)]
while len(out) < k:
x = random.randint(0, n)
if abs(x - out[-1]) > 1:
out.append(x)
return out
This is a recursive elegant version, I just added a check on k and N to avoid infinite recursion, if k>N/2 no solution exists.
The result is guaranteed random.
import random
def myFunc(N,k):
if k>(N+1)/2:
return "k to big for N"
returnValue = sorted(random.sample(range(1,N+1),k))
toTest = [x - returnValue[i - 1] for i, x in enumerate(returnValue)][1:]
if 1 in toTest:
return myFunc(N,k)
else:
return returnValue
print myFunc(10,4)
My implementation:
def ncsample(population, k):
import random
if k > 0:
i = random.randrange(0, len(population) - 2*(k-1))
return [population[i]] + ncsample(population[i+2:], k-1)
else:
return []
Note: it randomly finds the sequence in one shot (no rejection sampling in a while loop).
MATLAB implementation:
function r = ncsample(population, k)
if k > 0
i = randi(length(population) - 2*(k-1));
r = [population(i) ncsample(population((i+2):end), k-1)];
else
r = [];
end
end
Some tests:
>> for i=1:10; fprintf('%s\n',sprintf('%d ', ncsample(1:10, 4))); end
1 5 7 9
3 5 8 10
3 5 8 10
4 6 8 10
2 6 8 10
1 4 8 10
1 4 7 9
3 6 8 10
1 6 8 10
2 4 7 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With