Generating random sublist from ordered list that maintains ordering

Question

Consider a problem where a random sublist of k items, Y, must be selected from X, a list of n items, where the items in Y must appear in the same order as they do in X. The selected items in Y need not be distinct. One solution is this:

for i = 1 to k
    A[i] = floor(rand * n) + 1
    Y[i] = X[A[i]]
sort Y according to the ordering of A

However, this has running time O(k log k) due to the sort operation. To remove this it's tempting to

high_index = n
for i = 1 to k
    index = floor(rand * high_index) + 1
    Y[k - i + 1] = X[index]
    high_index = index

But this gives a clear bias to the returned list due to the uniform index selection. It feels like a O(k) solution is attainable if the indices in the second solution were distributed non-uniformly. Does anyone know if this is the case, and if so what properties the distribution the marginal indices are drawn from has?

amit · Accepted Answer

Unbiased O(n+k) solution is trivial, high-level pseudo code.

create an empty histogram of size n [initialized with all elements as zeros]
populate it with k uniformly distributed variables at range. (do k times histogram[inclusiveRand(1,n)]++)
iterate the initial list [A], while decreasing elements in the histogram and appending elements to the result list.

Explanation [edit]:

The idea is to chose k elements out of n at random, with uniform distribution for each, and create a histogram out of it.
This histogram now contains for each index i, how many times A[i] will appear in the resulting Y list.
Now, iterate the list A in-order, and for each element i, insert A[i] into the resulting Y list histogram[i] times.
This guarantees you maintain the order because you insert elements in order, and "never go back".
It also guarantees unbiased solution since for each i,j,K: P(histogram[i]=K) = P(histogram[j]=K), so for each K, each element has the same probability to appear in the resulting list K times.

I believe it can be done in O(k) using the order statistics [X_(i)] but I cannot figure it out though :\

Generating random sublist from ordered list that maintains ordering

Tags:

algorithm

sublist

timxyz

1 Answers

amit

Recent Activity

Donate For Us

Generating random sublist from ordered list that maintains ordering

Tags:

algorithm

sublist

timxyz

1 Answers

amit

Related questions

Recent Activity

Donate For Us