Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: sample() command subject to a constraint

Tags:

r

I am trying to randomly sample 7 numbers from 0 to 7 (with replacement), but subject to the constraint that the numbers chosen add up to 7. So for instance, the output 0 1 1 2 3 0 0 is okay, but the output 1 2 3 4 5 6 7 is not. Is there a way to use the sample command with added constraints?

I intend to use the replicate() function with the sample command as an argument, to return a list of N different vectors form the sample command. The way I am currently using the sample command (without any constraints), I need N to be very large in order to get as many possible vectors that sum to exactly 7 as possible. I figure there must be an easier way to do this!

Here is my code for that part:

x <- replicate(100000, sample(0:7, 7, replace=T))    

Ideally, I want 10,000 or 100,000 vectors in x to sum to 7, but would need an enormous N value to do this. Thanks for any help.

like image 404
Kirk Fogg Avatar asked Sep 20 '14 17:09

Kirk Fogg


People also ask

What is the use of sample () in R?

Sample() function is used to generate the random elements from the given data with or without replacement. where, data can be a vector or a dataframe. size represents the size of the sample.


1 Answers

To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:

library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]

From nrow(perms7), we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:

set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]    0    0    0    2    5    0    0
# [2,]    1    3    0    1    2    0    0
# [3,]    1    4    1    1    0    0    0
# [4,]    1    0    0    3    0    3    0
# [5,]    0    2    0    0    0    5    0
# [6,]    1    1    2    0    0    2    1

An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7 took 0.3 seconds on my computer and building a 1 million-row my.perms took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7 instead of generating each vector separately.

Here's a distribution of counts of numbers in the sample:

#      0      1      2      3      4      5      6      7 
# 323347 188162 102812  51344  22811   8629   2472    423 
like image 89
josliber Avatar answered Oct 08 '22 20:10

josliber