I am wondering what the best way to solve this is. Essentially I want to generate 20 samples which add to 100 but also where (x1+x2>20). I am struggling to get something that is fast and efficient. I realise that I could filter out the lines that don't meet this criteria but it isn't efficient if I generate 10,000 rather than 20.
The code is as below:
n = 20
x1 = sample(0:100,n,replace = TRUE)
x2 = sample(0:100,n,replace = TRUE)
x3 = sample(0:100,n,replace = TRUE)
index = (x1+x2+x3)>100
G=(x1+x2)>20
while(sum(index)>0&&sum(G)>0){
x1[index&&G] = sample(0:100,n,replace = TRUE)
x2[index&&G] = sample(0:100,n,replace = TRUE)
x3[index&&G] = sample(0:100,n,replace = TRUE)
index =(x1+x2+x3)>100
G=(x1+x2)>20
}
x4=rep(100,n)-x1-x2-x3
df <- data.frame(x1,x2,x3,x4)
Thanks in advance.
Another possibility:
Pick three breaks of the sequence 0:100.
x1, x2, x3 and x4 are then generated in-between those breaks. If x1 + x2 is smaller than 20, then x3 + x4 is greater than 20, so we can swap them.
generate_four_numbers <- function(from = 0, to = 100) {
breaks <- sort(sample(seq(from, to), 3 ,replace = TRUE))
x1 <- breaks[1]
x2 <- breaks[2] - breaks[1]
x3 <- breaks[3] - breaks[2]
x4 <- 100 - breaks[3]
if (x1 + x2 <= 20) {
return(data.frame(x1 = x4, x2 = x3, x3 = x2, x4 = x1)
}
data.frame(x1, x2, x3, x4)
}
res <- do.call(rbind, lapply(1:10000, function(x) generate_four_numbers()))
table(rowSums(res)) # all at 100
length(which(res$x1 + res$x2 > 20)) / nrow(res) # 100 % acceptable
Here is an unbiased way to pick k numbers in the range 0:n which sum to n. It is based on the stars and bars encoding:
#picks k random numbers in range 0:n which sum to n:
pick <- function(k,n){
m <- n + k - 1 #number of stars and bars
bars <- sort(sample(1:m,k-1)) #positions of the bars
c(bars,m+1)-c(0,bars)-1
}
This generates a single example, returning a vector. As @Guillaume Devailly observes in their answer, most of the samples will satisfy the additional constraint on the sum of the first 2 numbers, so you can just filter out those that don't.
Note that if you want 4 numbers in the range 1:100 which sum to 100 you could just use 1 + pick(4,96).
To enforce the constraint on the first two numbers:
pick.sample <- function(){
while(TRUE){
x <- pick(4,100)
if(sum(x[1:2]) >20) return(x)
}
}
Then
df <- data.frame(t(replicate(10000,pick.sample())))
will create a 10,000 row dataframe where each row is a sample which satisfies the constraints.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With