Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select n random values from each rows of a dataframe in R?

Tags:

r

dplyr

I have a dataframe

df= data.frame(a=c(56,23,15,10),
              b=c(43,NA,90.7,30.5),
              c=c(12,7,10,2),
              d=c(1,2,3,4),
              e=c(NA,45,2,NA))

I want to select two random non-NA row values from each row and convert the rest to NA

Required Output- will differ because of randomness

df= data.frame(
              a=c(56,NA,15,NA),
              b=c(43,NA,NA,NA),
              c=c(NA,7,NA,2),
              d=c(NA,NA,3,4),
              e=c(NA,45,NA,NA))

Code Used
I know to select random non-NA value from specific rows

set.seed(2)
sample(which(!is.na(df[1,])),2)

But no idea how to apply it all dataframe and get the required output

like image 386
Amit Avatar asked Oct 26 '22 10:10

Amit


1 Answers

You may write a function to keep n random values in a row.

keep_n_value <- function(x, n) {
  x1 <- which(!is.na(x))
  x[-sample(x1, n)] <- NA
  x
}

Apply the function by row using base R -

set.seed(123)
df[] <- t(apply(df, 1, keep_n_value, 2))
df
#   a    b  c  d  e
#1 NA   NA 12  1 NA
#2 NA   NA  7  2 NA
#3 NA 90.7 10 NA NA
#4 NA 30.5 NA  4 NA

Or if you prefer tidyverse -

purrr::pmap_df(df, ~keep_n_value(c(...),  2))
like image 198
Ronak Shah Avatar answered Oct 29 '22 14:10

Ronak Shah