Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly recode the first and second instance of a value in each row?

Tags:

r

I have a dataframe where there two instances of a value in each row (say the value is 34). I would like to replace one instance with 3 and the other with 4 (without replament, so if the first instance gets 4, the second instance gets 3. And visa versa.) And I would like to do the assignments randomly (so that some rows use 3 then 4, other rows 4 then 3.)

Here's my example:

# sample data
df1 <- data.frame(a= c(1, 2, NA, NA),b= c(2, NA, 1, NA),c= c(NA, NA,34, 2),
                 d= c(NA, 34, NA,1),e= c(34, 34,2,34),f= c(34, 1, NA,NA),
                 g= c(NA, NA,34, NA), h= c(NA,NA, NA, 34))

> df1
   a  b  c  d  e  f  g  h
1  1  2 NA NA 34 34 NA NA
2  2 NA NA 34 34  1 NA NA
3 NA  1 34 NA  2 NA 34 NA
4 NA NA  2  1 34 NA NA 34

And here is an output that fits with my goal:

   a  b  c  d e  f  g  h
1  1  2 NA NA 3  4 NA NA
2  2 NA NA  4 3  1 NA NA
3 NA  1  4 NA 2 NA  3 NA
4 NA NA  2  1 4 NA NA  3

In my attempt so far I've been able to identify the columns that hold 34 using which() with apply()

indexes_34 <- apply(df1, 1,  function(x) {which(x == 34)})

And I have randomly generated a list of vectors whose elements hold 3 and 4 or 4 and 3.

ord <- list()
for(i in 1:nrow(df1)){
  ord[[i]] <- sample(c(3,4), 2)
}

But I am having trouble with writing code that will assign the values in each 'ord' vector to each row of 'df1' at the correct indexes.

Is there a straightforward way to do this?

like image 840
xilliam Avatar asked Mar 01 '23 14:03

xilliam


2 Answers

One dplyr and purrr option could be:

df1 %>%
    mutate(pmap_dfr(across(everything()), 
                    ~ `[<-`(c(...), which(c(...) == 34), sample(c(3, 4)))))

   a  b  c  d e  f  g  h
1  1  2 NA NA 4  3 NA NA
2  2 NA NA  4 3  1 NA NA
3 NA  1  4 NA 2 NA  3 NA
4 NA NA  2  1 4 NA NA  3
like image 116
tmfmnk Avatar answered Apr 06 '23 00:04

tmfmnk


Here is a way to do it by using which(..., arr.ind = TRUE) to select 34s and replace them:

set.seed(123)
m <- as_tibble(which(df1 == 34, arr.ind = T))
m <- m %>%
    group_by(row) %>%
    mutate(col = sample(col), value = c(3, 4)) %>%
    ungroup()
df1[as.matrix(m[, 1:2])] <- m$value
#    a  b  c  d e  f  g  h
# 1  1  2 NA NA 3  4 NA NA
# 2  2 NA NA  3 4  1 NA NA
# 3 NA  1  3 NA 2 NA  4 NA
# 4 NA NA  2  1 4 NA NA  3
like image 37
mt1022 Avatar answered Apr 06 '23 00:04

mt1022