Generate a number is range (1,n) but not in a list (i,j)

Question

How can I generate a random number that is in the range (1,n) but not in a certain list (i,j)?

Example: range is (1,500), list is [1,3,4,45,199,212,344].

Note: The list may not be sorted

Timothy Shields · Accepted Answer

Rejection Sampling

One method is rejection sampling:

Generate a number x in the range (1, 500)
Is x in your list of disallowed values? (Can use a hash-set for this check.)
- If yes, return to step 1
- If no, x is your random value, done

This will work fine if your set of allowed values is significantly larger than your set of disallowed values:
if there are G possible good values and B possible bad values, then the expected number of times you'll have to sample x from the G + B values until you get a good value is (G + B) / G (the expectation of the associated geometric distribution). (You can sense check this. As G goes to infinity, the expectation goes to 1. As B goes to infinity, the expectation goes to infinity.)

Sampling a List

Another method is to make a list L of all of your allowed values, then sample L[rand(L.count)].

sh1 · Answer

The technique I usually use when the list is length 1 is to generate a random integer r in [1,n-1], and if r is greater or equal to that single illegal value then increment r.

This can be generalised for a list of length k for small k but requires sorting that list (you can't do your compare-and-increment in random order). If the list is moderately long, then after the sort you can start with a bsearch, and add the number of values skipped to r, and then recurse into the remainder of the list.

For a list of length k, containing no value greater or equal to n-k, you can do a more direct substitution: generate random r in [1,n-k], and then iterate through the list testing if r is equal to list[i]. If it is then set r to n-k+i (this assumes list is zero-based) and quit.

That second approach fails if some of the list elements are in [n-k,n].

I could try to invest something clever at this point, but what I have so far seems sufficient for uniform distributions with values of k much less than n...

Create two lists -- one of illegal values below n-k, and the other the rest (this can be done in place).
Generate random r in [1,n-k]
Apply the direct substitution approach for the first list (if r is list[i] then set r to n-k+i and go to step 5).
If r was not altered in step 3 then we're finished.
Sort the list of larger values and use the compare-and-increment method.

Observations:

If all values are in the lower list, there will be no sort because there is nothing to sort.
If all values are in the upper list, there will be no sort because there is no occasion on which r is moved into the hazardous area.
As k approaches n, the maximum size of the upper (sorted) list grows.
For a given k, if more value appear in the upper list (the bigger the sort), the chance of getting a hit in the lower list shrinks, reducing the likelihood of needing to do the sort.

Refinement: Obviously things get very sorty for large k, but in such cases the list has comparatively few holes into which r is allowed to settle. This could surely be exploited.

I might suggest something different if many random values with the same list and limits were needed. I hope that the list of illegal values is not the list of results of previous calls to this function, because if it is then you wouldn't want any of this -- instead you would want a Fisher-Yates shuffle.

Richard Fung · Answer

Rejection sampling would be the simplest if possible as described already. However, if you didn't want use that, you could convert the range and disallowed values to sets and find the difference. Then, you could choose a random value out of there.

Assuming you wanted the range to be in [1,n] but not in [i,j] and that you wanted them uniformly distributed.

In Python

total = range(1,n+1)
disallowed = range(i,j+1)
allowed = list( set(total) - set(disallowed) )

return allowed[random.randrange(len(allowed))]

(Note that this is not EXACTLY uniform since in all likeliness, max_rand%len(allowed) != 0 but this will in most practical applications be very close)

Generate a number is range (1,n) but not in a list (i,j)

Tags:

algorithm

random

math

Jayram

3 Answers

Rejection Sampling

Sampling a List

Timothy Shields

sh1

Richard Fung

Recent Activity

Donate For Us

Generate a number is range (1,n) but not in a list (i,j)

Tags:

algorithm

random

math

Jayram

3 Answers

Rejection Sampling

Sampling a List

Timothy Shields

sh1

Richard Fung

Related questions

Recent Activity

Donate For Us