Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate random number of missing values in R

Tags:

random

r

I have a data frame such like that:

df<-data.frame(time1=rbinom(100,1,0.3),
               time2=rbinom(100,1,0.4),
               time3=rbinom(100,1,0.5),
               time4=rbinom(100,1,0.6))

How could I generate random missing values for each time variable with up to 20% number of missing? Namely, in this case, the total number of missing less than 20 in each column and they are missed in random from subjects (rows).

like image 368
David Z Avatar asked Dec 11 '22 08:12

David Z


1 Answers

You could do:

insert_nas <- function(x) {
  len <- length(x)
  n <- sample(1:floor(0.2*len), 1)
  i <- sample(1:len, n)
  x[i] <- NA 
  x
}

df2 <- sapply(df, insert_nas)
df2

This will give you up to maximal 20% missings per column

colSums(is.na(df2)) / nrow(df2)

time1 time2 time3 time4 
 0.09  0.16  0.19  0.14 
like image 81
Mark Heckmann Avatar answered Jan 17 '23 14:01

Mark Heckmann