Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use vectorisation in R to change a DF value based on a condition?

Suppose I have the following DF:

C1 C2
0 0
1 1
1 1
0 0
. .
. .

I now want to apply these following conditions on the Dataframe:

  • The value for C1 should be 1
  • A random integer between 0 and 5 should be less than 2

If both these conditions are true, I change the C1 and C2 value for that row to 2

I understand this can be done by using the apply function, and I have used the following:

C1 <- c(0, 1,1,0,1,0,1,0,1,0,1)
C2 <- c(0, 1,1,0,1,0,1,0,1,0,1)

df <- data.frame(C1, C2)

fun <- function(x){
  if (sample(0:5, 1) < 2){
    x[1:2] <- 2
  }
  return (x)
}

index <- df$C1 ==1  // First Condition
processed_Df <-t(apply(df[index,],1,fun)) // Applies Second Condition
df[index,] <-  processed_Df

Output:

C1 C2
0 0
2 2
1 1
0 0
. .
. .

Some Rows have both conditions met, some doesn't (This is the main functionality, I would like to achieve)

Now I want to achieve this same using vectorization and without using loops or the apply function. The only confusion I have is "If I don't use apply, won't each row get the same result based on the condition's result? (For example, the following:)

df$C1 <- ifelse(df$C1==1 & sample(0:5, 1) < 5, 2, df$C1)

This changes all the rows in my DF with C1==2 to 2 when there should possibly be many 1's.

Is there a way to get different results for the second condition for each row without using the apply function? Hopefully my question makes sense.

Thanks

like image 716
pr0grmr Avatar asked Dec 30 '25 06:12

pr0grmr


2 Answers

You need to sample the values for nrow times. Try this method -

set.seed(167814)
df[df$C1 == 1 & sample(0:5, nrow(df), replace = TRUE) < 2, ] <- 2
df

#   C1 C2
#1   0  0
#2   2  2
#3   2  2
#4   0  0
#5   1  1
#6   0  0
#7   2  2
#8   0  0
#9   1  1
#10  0  0
#11  1  1
like image 146
Ronak Shah Avatar answered Jan 03 '26 16:01

Ronak Shah


Here is a fully vectorized way. Create the logical index index just like in the question. Then sample all random integers r in one call to sample. Replace in place based on the conjunction of the index and the condition r < 2.

x <- 'C1    C2
0   0
1   1
1   1
0   0'
df1 <- read.table(textConnection(x), header = TRUE)

set.seed(1)
index <- df1$C1 == 1
r <- sample(0:5, length(index), TRUE)
df1[index & r < 2, c("C1", "C2")] <- 2
df1
#>   C1 C2
#> 1  0  0
#> 2  1  1
#> 3  2  2
#> 4  0  0

Created on 2022-05-11 by the reprex package (v2.0.1)

like image 40
Rui Barradas Avatar answered Jan 03 '26 16:01

Rui Barradas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!