Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reservoir sampling

To retrieve k random numbers from an array of undetermined size we use a technique called reservoir sampling. Can anybody briefly highlight how it happens with a sample code?

like image 569
Jony Avatar asked Apr 10 '10 07:04

Jony


People also ask

What is biased reservoir sampling?

Biased Reservoir Sampling In biased reservoir sampling Alg. 3.1, [2] the probability of a data point x(t) being in the reservoir is a decreasing function of its lingering time within R. So the probability of finding points of the sooner history in R is high. Very old data points will be in R with very low probability.

What are the types of sampling in stream data?

There are two basic ways of generating a random sample of any data set – sampling without replacement and sampling with replacement. Consider a data stream with N elements and a sample size n. In random sampling with replacement, each element of the sample is chosen at random from among all N elements of the data set.

What is sampling and different types of sampling techniques?

There are two types of sampling methods: Probability sampling involves random selection, allowing you to make strong statistical inferences about the whole group. Non-probability sampling involves non-random selection based on convenience or other criteria, allowing you to easily collect data.

How does weighted random sampling work?

In weighted random sampling (WRS) the items are weighted and the probability of each item to be selected is determined by its relative weight.


2 Answers

I actually did not realize there was a name for this, so I proved and implemented this from scratch:

import random def random_subset( iterator, K ):     result = []     N = 0      for item in iterator:         N += 1         if len( result ) < K:             result.append( item )         else:             s = int(random.random() * N)             if s < K:                 result[ s ] = item      return result 

From: http://web.archive.org/web/20141026071430/http://propersubset.com:80/2010/04/choosing-random-elements.html

With a proof near the end.

like image 163
Larry Avatar answered Oct 15 '22 04:10

Larry


Following Knuth's (1981) description more closely, Reservoir Sampling (Algorithm R) could be implemented as follows:

import random  def sample(iterable, n):     """     Returns @param n random items from @param iterable.     """     reservoir = []     for t, item in enumerate(iterable):         if t < n:             reservoir.append(item)         else:             m = random.randint(0,t)             if m < n:                 reservoir[m] = item     return reservoir 
like image 45
sam Avatar answered Oct 15 '22 06:10

sam