I am trying to generate a random sample that excludes certain "bad data." I do not know whether the data is "bad" until after I sample it. Thus, I need to make a random draw from the population and then test it. If the data is "good" then keep it. If the data is "bad" then randomly draw another and test it. I would like to do this until my sample size reaches 25. Below is a simplified example of my attempt to write a function that does this. Can anyone please tell me what I am missing?
df <- data.frame(NAME=c(rep('Frank',10),rep('Mary',10)), SCORE=rnorm(20))
df
random.sample <- function(x) {
x <- df[sample(nrow(df), 1), ]
if (x$SCORE > 0) return(x)
#if (x$SCORE <= 0) run the function again
}
random.sample(df)
repeat loop in R: A repeat loop is used to iterate over a block of code multiple number of times. There is no condition check in repeat loop to exit the loop. The only way to exit a repeat loop is to call break.
A repeat loop is used any time you want to execute one or more statements repeatedly some number of times. The statements to be repeated are preceded by one of the repeat statements described below, and must always be followed by an end repeat statement to mark the end of the loop. Repeat loops may be nested.
Here is a general use of a while
loop:
random.sample <- function(x) {
success <- FALSE
while (!success) {
# do something
i <- sample(nrow(df), 1)
x <- df[sample(nrow(df), 1), ]
# check for success
success <- x$SCORE > 0
}
return(x)
}
An alternative is to use repeat
(syntactic sugar for while(TRUE)
) and break
:
random.sample <- function(x) {
repeat {
# do something
i <- sample(nrow(df), 1)
x <- df[sample(nrow(df), 1), ]
# exit if the condition is met
if (x$SCORE > 0) break
}
return(x)
}
where break
makes you exit the repeat
block. Alternatively, you could have if (x$SCORE > 0) return(x)
to exit the function directly.
use this after your first sample
while (any(bad <- (x$SCORE <= 0)))
x[bad, ] <- df[sample(nrow(df), sum(bad)), ]
You can just select the rows to sample directly like so (just 5):
> df <- data.frame(NAME=c(rep('Frank',10),rep('Mary',10)), SCORE=rnorm(20))
> df[sample(which(df$SCORE>0), 5),]
NAME SCORE
14 Mary 1.0858854
10 Frank 0.7037989
16 Mary 0.7688913
5 Frank 0.2067499
17 Mary 0.4391216
this is without replacement, for bootstrap put in replace=T
.
random.sample <- function(x) {
x <- df[sample(nrow(df), 1), ]
if (x$SCORE > 0) return(x)
Recall(x)# run the function again
}
random.sample(df)
# NAME SCORE
#14 Mary 1.252566
It seems to me that this should work as well:
df$SCORE[ df$SCORE > 0 ][ sample(1:sum(df$SCORE > 0), 1) ]
#[1] 0.6579631
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With