Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column difference from baseline

Tags:

r

difference

This may be a duplicate question, I wouldn't be surprised if that's the case,

This is an example of a dataset I am dealing with

ID    Type    Time1     Time2     Time3
1     A1      12.23     NA        NA
2     A1       0.35     0.53      NA
2     A2       5.78     NA        10.25
3     A5       NA       NA        4.19
4     A3       NA       3.18      7.15
5     A5       10.91    4.56      2.45

My goal is to create two columns [Delta1, Delta2] like this

Delta1 : This column stores the difference between values in Time2-Time1 only among rows where all three values : Time1, Time2, Time3 are available. For example last row ID 5 has values for all three time, time1,time2,time3 so Delta1 = 4.56-10.91 = -6.35

Delta2 : This column stores the difference between either Time2-Time1 or Time3-Time1 or Time3-Time2. If a row does not have any two time values, then it is 0

Final expected output

ID    Type    Time1     Time2     Time3     Delta1     Delta2
1     A1      12.23     NA        NA                   0
2     A1       0.35     0.53      NA                   0.18
2     A2       5.78     NA        10.25                4.47  
3     A5       NA       NA        4.19                 0
4     A3       NA       3.18      7.15                 3.97
5     A5       10.91    4.56      2.45      -6.35     -2.11

Any help is much appreciated , thanks in advance.

like image 630
Ahir Bhairav Orai Avatar asked Mar 14 '26 00:03

Ahir Bhairav Orai


2 Answers

df$Delta1 <- ifelse(!is.na(df$Time1) & !is.na(df$Time2) & !is.na(df$Time3),
                    df$Time2 - df$Time1,
                    NA)

df$Delta2 <- vapply(seq_len(nrow(df)), \(x){
                                              x = na.omit(c(df$Time3[x], df$Time2[x], df$Time1[x]))
                                              x = x[1] - x[2]
                                              if(is.na(x)) return(0)
                                              return(x)
                                            }, 0)

Result:

> df
  ID Type Time1 Time2 Time3 Delta1 Delta2
1  1   A1 12.23    NA    NA     NA   0.00
2  2   A1  0.35  0.53    NA     NA   0.18
3  2   A2  5.78    NA 10.25     NA   4.47
4  3   A5    NA    NA  4.19     NA   0.00
5  4   A3    NA  3.18  7.15     NA   3.97
6  5   A5 10.91  4.56  2.45  -6.35  -2.11
like image 158
Caspar V. Avatar answered Mar 16 '26 13:03

Caspar V.


With dplyr, you could use coalesce() to find the first non-missing element.

library(dplyr)

df %>%
  mutate(Delta1 = ifelse(if_any(starts_with("Time"), is.na), NA, Time2-Time1),
         Delta2 = coalesce(Time3-Time2, Time3-Time1, Time2-Time1, 0))

#   ID Type Time1 Time2 Time3 Delta1 Delta2
# 1  1   A1 12.23    NA    NA     NA   0.00
# 2  2   A1  0.35  0.53    NA     NA   0.18
# 3  2   A2  5.78    NA 10.25     NA   4.47
# 4  3   A5    NA    NA  4.19     NA   0.00
# 5  4   A3    NA  3.18  7.15     NA   3.97
# 6  5   A5 10.91  4.56  2.45  -6.35  -2.11
like image 45
Darren Tsai Avatar answered Mar 16 '26 13:03

Darren Tsai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!