Suppose I have the following DF:
| C1 | C2 |
|---|---|
| 0 | 0 |
| 1 | 1 |
| 1 | 1 |
| 0 | 0 |
| . | . |
| . | . |
I now want to apply these following conditions on the Dataframe:
If both these conditions are true, I change the C1 and C2 value for that row to 2
I understand this can be done by using the apply function, and I have used the following:
C1 <- c(0, 1,1,0,1,0,1,0,1,0,1)
C2 <- c(0, 1,1,0,1,0,1,0,1,0,1)
df <- data.frame(C1, C2)
fun <- function(x){
if (sample(0:5, 1) < 2){
x[1:2] <- 2
}
return (x)
}
index <- df$C1 ==1 // First Condition
processed_Df <-t(apply(df[index,],1,fun)) // Applies Second Condition
df[index,] <- processed_Df
Output:
| C1 | C2 |
|---|---|
| 0 | 0 |
| 2 | 2 |
| 1 | 1 |
| 0 | 0 |
| . | . |
| . | . |
Some Rows have both conditions met, some doesn't (This is the main functionality, I would like to achieve)
Now I want to achieve this same using vectorization and without using loops or the apply function. The only confusion I have is "If I don't use apply, won't each row get the same result based on the condition's result? (For example, the following:)
df$C1 <- ifelse(df$C1==1 & sample(0:5, 1) < 5, 2, df$C1)
This changes all the rows in my DF with C1==2 to 2 when there should possibly be many 1's.
Is there a way to get different results for the second condition for each row without using the apply function? Hopefully my question makes sense.
Thanks
You need to sample the values for nrow times. Try this method -
set.seed(167814)
df[df$C1 == 1 & sample(0:5, nrow(df), replace = TRUE) < 2, ] <- 2
df
# C1 C2
#1 0 0
#2 2 2
#3 2 2
#4 0 0
#5 1 1
#6 0 0
#7 2 2
#8 0 0
#9 1 1
#10 0 0
#11 1 1
Here is a fully vectorized way. Create the logical index index just like in the question. Then sample all random integers r in one call to sample. Replace in place based on the conjunction of the index and the condition r < 2.
x <- 'C1 C2
0 0
1 1
1 1
0 0'
df1 <- read.table(textConnection(x), header = TRUE)
set.seed(1)
index <- df1$C1 == 1
r <- sample(0:5, length(index), TRUE)
df1[index & r < 2, c("C1", "C2")] <- 2
df1
#> C1 C2
#> 1 0 0
#> 2 1 1
#> 3 2 2
#> 4 0 0
Created on 2022-05-11 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With