I have a data frame data
. At each row i
have assigned a weight that is in data$ww
.
Now I would like to make a sample new_data
of data
, weighted by df$ww
.
I have tried with subset
but it very slow.
# sample data
data <- data.frame(var1 = log(sample(1:5000)))
ndata <- nrow(data)
maxW <- max(data$var1)
nsample <- 4000
rr <- runif(ndata)
data$ww <- cumsum(exp(data$var1))
new_data <- data[0, ]
i <- 1
while(nrow(new_data) < nsample) {
new_data[i, ] <- subset(data, data$ww > rr[i] * maxW)[1,]
i <- i + 1
}
Is there a faster way?
Use the prob
argument of sample()
:
samp_idx <- sample(seq_len(nrow(data)), nsample, prob=data$ww)
new_data <- data[samp_idx, ]
Something like this. Running time is
# user system elapsed
# 0.015 0.000 0.014
versus your version:
# user system elapsed
# 4.278 0.007 4.290
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With