I have a data frame such like that:
df<-data.frame(time1=rbinom(100,1,0.3),
time2=rbinom(100,1,0.4),
time3=rbinom(100,1,0.5),
time4=rbinom(100,1,0.6))
How could I generate random missing values for each time variable with up to 20% number of missing? Namely, in this case, the total number of missing less than 20 in each column and they are missed in random from subjects (rows).
You could do:
insert_nas <- function(x) {
len <- length(x)
n <- sample(1:floor(0.2*len), 1)
i <- sample(1:len, n)
x[i] <- NA
x
}
df2 <- sapply(df, insert_nas)
df2
This will give you up to maximal 20% missings per column
colSums(is.na(df2)) / nrow(df2)
time1 time2 time3 time4
0.09 0.16 0.19 0.14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With