Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

generate non-consecutive samples

Tags:

random

matlab

How can we efficiently generate k random and non-consecutive samples out of [1,...,N]?

Non-desired Example with (N=10, k=4): 2,3,8,10

This is not a desired example since 2 and 3 are consecutive.

Desired Example with (N=10, k=4): 2,6,8,10

This is a good example since the difference between every pair of samples is greater than 1

like image 849
A.M. Avatar asked Jun 25 '15 19:06

A.M.


9 Answers

sort(randperm(N-(k-1),k))+[0:(k-1)]

There is a simple obersavation behind this solution, if you take any sorted solution to your problem and substract [0:(k-1)], you end up with a random choice of k numbers out of N-(k-1)

like image 169
Daniel Avatar answered Oct 02 '22 12:10

Daniel


Let S denote the set of all k-element vectors with values taken from [1,...,N] that don't have any consecutive values. To randomly sample with a uniform distribution over S, you can use a rejection method:

  1. Sample uniformly on a larger sample space, T.
  2. If the sample belongs to the target region S, accept the sample. Else return to step 1 (the sample is rejected).

In Matlab, it's easy to generate uniformly distributed k-element vectors with values taken from [1,...,N] without replacement (function randsample). So this is used as the sample space T:

k = 4;
N = 10;
result = [1 1];                         % // just to get while loop started
while any(diff(result)<=1)              % // does not meet condition: try again
    result = sort(randsample(N, k).');  %'// random sample without replacement
end
like image 25
Luis Mendo Avatar answered Oct 02 '22 14:10

Luis Mendo


A Python class which correctly checks every pair of samples. You're responsible for not passing it a set of numbers that is impossible, though (like N = 10, k = 100).

>>> class NonConsecutiveSampler(object):
        def __init__(self,N):
                import random
                self.num = N
        def get_samples(self,k):
                possibilities = [i for i in range(1,self.num + 1)]
                samples = []
                while len(samples) < k:
                        r = random.sample(possibilities,1)[0]
                        samples.append(r)
                        for i in range(r - 1, r + 2):
                                if i in possibilities:
                                        possibilities.remove(i)
                samples.sort()
                return samples


>>> n = NonConsecutiveSampler(10)
>>> n.get_samples(4)
[2, 5, 8, 10]
>>> n.get_samples(4)
[1, 5, 7, 10]
>>> n.get_samples(4)
[3, 6, 8, 10]
>>> n.get_samples(4)
[1, 3, 5, 8]

EDIT: Made it much more efficient

like image 28
Brien Avatar answered Oct 02 '22 13:10

Brien


Sometimes it's faster and easier to generate more samples than you need and then throw away the undesireable values.

One (slow) example.

vec= randi(100,1,1);
for j = 2:50,
   while(abs(vec(j)-vec(j-1)<2) vec(j)= randi(100,1,1);end;
end

Another way. Suppose you want 50 samples

vec = rand(100,100,1);
badindex = find(abs(vec(1:99)-vec(2:100) < 1));
vec(badindex) = vec(badindex-1)+vec(badindex+1);
% if you don't want big values,
vec(vec>100) = vec (vec>100) -100; % to ensure, I hope, that neighbors

% are nonconsecutive (This would be easier in R) .

like image 28
Carl Witthoft Avatar answered Oct 02 '22 14:10

Carl Witthoft


You can make the increment between samples evenly distributed between 2 and N-1 (to avoid consecutive and repeated numbers):

N=10;
k=4;
increments = floor(rand(1,k)*(N-2))+2  %// increments allowed are from 2 to N-1 inclusive
out = mod(cumsum(increments), N)+1   %// sum increments

Same in python:

from numpy import cumsum, floor, mod, random
N=5
k=100
increments = floor(random.rand(1,k)*(N-2))+2
out = mod(cumsum(increments), N)+1
print(out)

[ 5.  3.  1.  5.  2.  4.  3.  2.  4.  2.  4.  3.  1.  5.  4.  3.  5.  4.
  2.  5.  4.  2.  5.  2.  4.  1.  5.  4.  1.  5.  3.  1.  3.  2.  4.  1.
  5.  4.  1.  3.  5.  4.  3.  5.  2.  1.  3.  2.  4.  3.  1.  4.  2.  1.
  3.  2.  1.  4.  3.  2.  1.  3.  5.  3.  5.  4.  2.  4.  2.  1.  3.  2.
  1.  3.  5.  2.  5.  4.  3.  1.  4.  1.  4.  3.  5.  4.  2.  1.  5.  2.
  1.  5.  4.  2.  4.  3.  5.  2.  4.  1.]

Over 100 iterations, even if I limit the number to 1..5, there is no repeated/consecutive number.

like image 38
nicolas Avatar answered Oct 02 '22 14:10

nicolas


A solution in MATLAB (perhaps inelegant) could be something like this:

N = 10;
k = 4;
out = zeros(1,k);

vec = 1 : N;

for idx = 1 : k
    ind = randi(numel(vec), 1);
    left = max(ind-1, 1); right = min(numel(vec), ind+1);
    out(idx) = vec(ind);
    to_remove = ind;
    if vec(left) == vec(ind)-1 
        to_remove = [to_remove left];
    end
    if vec(right) == vec(ind)+1
        to_remove = [to_remove right];
    end
    vec(to_remove) = [];
end

We first declare N and k, then declare an output array of zeroes that is k long. We then generate a sampling vector vec that goes from 1 up to as N initially. Next, for each value we want to place into the output, we generate a random position to sample from the vector, then take a look at the position from the left and from the right... ensuring that we are within the boundaries of the array. Also, we only remove to the left or right if the value at the left of the index to remove and also the right are equal to each other (thanks beaker!)

We use this location and sample from this vector, place the value at this location to the output, then remove the indices in this vector that are to the left, to the right, and the actual index itself from this vector. This removes the possibility of sampling from those values again. We repeat this until we run out of values to place in the output.

Here are a few of trial runs:

>> out

out =

     9     7     1     5

>> out

out =

     7     1     4    10

>> out

out =

    10     8     1     6

>> out

out =

    10     4     8     1
like image 43
rayryeng Avatar answered Oct 02 '22 14:10

rayryeng


A not-particularly-elegant python solution:

def nonseq(n, k):
    out = [random.randint(0, n)]
    while len(out) < k:
        x = random.randint(0, n)
        if abs(x - out[-1]) > 1:
            out.append(x)
    return out
like image 33
mdurant Avatar answered Oct 02 '22 13:10

mdurant


This is a recursive elegant version, I just added a check on k and N to avoid infinite recursion, if k>N/2 no solution exists.

The result is guaranteed random.

import random

def myFunc(N,k):
    if k>(N+1)/2:
        return "k to big for N"
    returnValue = sorted(random.sample(range(1,N+1),k))
    toTest = [x - returnValue[i - 1] for i, x in enumerate(returnValue)][1:]
    if 1 in toTest:
        return myFunc(N,k)
    else:
        return returnValue

print myFunc(10,4)
like image 25
Stijn Nevens Avatar answered Oct 02 '22 14:10

Stijn Nevens


My implementation:

def ncsample(population, k):
    import random
    if k > 0:
        i = random.randrange(0, len(population) - 2*(k-1))
        return [population[i]] + ncsample(population[i+2:], k-1)
    else:
        return []

Note: it randomly finds the sequence in one shot (no rejection sampling in a while loop).

MATLAB implementation:

function r = ncsample(population, k)
    if k > 0
        i = randi(length(population) - 2*(k-1));
        r = [population(i) ncsample(population((i+2):end), k-1)];
    else
        r = [];
    end
end

Some tests:

>> for i=1:10; fprintf('%s\n',sprintf('%d ', ncsample(1:10, 4))); end
1 5 7 9 
3 5 8 10 
3 5 8 10 
4 6 8 10 
2 6 8 10 
1 4 8 10 
1 4 7 9 
3 6 8 10 
1 6 8 10 
2 4 7 9 
like image 30
fferri Avatar answered Oct 02 '22 14:10

fferri