Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sample vector exactly according to the probability given

Tags:

r

sample

I believe there should be a function for this in R. However, I am not able to find it. What I need is to get vectors depending on the probability given. I thought sample can do this but it is not what I exactly want.

sample(c(1, 2, 3, 4), size = 4, prob=c(0.25, 0.25, 0.25, 0.25)) 

gives

# [1] 1 3 4 2

which is correct.

Then I try

sample(c(1, 2, 3, 4), size = 8, replace = T, prob=c(0.25, 0.25, 0.25, 0.25)) 

# [1] 1 4 4 3 2 3 1 3

What I actually need is something like

#[1] 1 4 4 2 2 3 1 3

OR

#[1] 2 3 1 1 4 4 2 3

OR something of similar sort where the given vector is divided exactly according to the probability given. So in the given example the output vector should contain 0.25 of every vector in c(1, 2, 3, 4). So if size = 8 then 0.25 of it is 2 which should be the length of every element in c(1, 2, 3, 4). Is there already a function in R for this or I would have to write a custom one?

like image 776
Ronak Shah Avatar asked Dec 09 '15 18:12

Ronak Shah


1 Answers

Since you want the number of repetitions of each value to be deterministic, rather than random, use rep (instead of sample) to repeat each value in proportion to its probability in prob. Then you can create random permutations of the resulting vector.

x = c(1,2,3,4)

prob = c(0.1,0.2,0.3,0.4)

# Total sample size
n = 20

result = rep(x, round(n * prob))

[1] 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4

Then to create, say, 100 random permutations:

replicate(100, sample(result))
like image 54
eipi10 Avatar answered Oct 13 '22 00:10

eipi10