My dataframe looks like this
ID t1 obs1 t2 obs2 t3 obs3
1 0 a 11 d 0 g
2 0 b 13 e 11 i
3 0 c 0 f 0 h
I need to make sure each ID has at least one t above 10 (delete row if not). Then, I want to save the lowest t value above 10, but also save the corresponding obs in new columns. (The complicated part about my question is that the lowest t above 10 could be in any column). The corresponding obs to some t is located in the next column, so that helps. So my resulting data frame would look like this:
ID t1 obs1 t2 obs2 t3 obs3 lowesttabove10 correspondingobs
1 0 a 11 d 0 g 11 d
2 0 b 13 e 11 i 11 i
With data.table, go to long format:
library(data.table)
setDT(DT)
dat = melt(DT, measure.vars = patterns("^t\\d+$", "^obs\\d+$"), value.name = c("t", "obs"))
setorder(dat, ID, variable)
# ID variable t obs
# 1: 1 1 0 a
# 2: 1 2 11 d
# 3: 1 3 0 g
# 4: 2 1 0 b
# 5: 2 2 13 e
# 6: 2 3 11 i
# 7: 3 1 0 c
# 8: 3 2 0 f
# 9: 3 3 0 h
Find max value per group and mark groups to keep:
IDDT = dat[order(-t),
.(max.variable = first(variable), max.t = first(t), max.obs = first(obs))
, by=ID]
IDDT[, keep := max.t > 10]
# ID max.variable max.t max.obs keep
# 1: 2 2 13 e TRUE
# 2: 1 2 11 d TRUE
# 3: 3 1 0 c FALSE
Find min value over 10 per kept group using a rolling update join:
IDDT[(keep), c("my.variable", "my.t", "my.obs") := {
m = .(ID = ID, t_thresh = 10)
dat[m, on=.(ID, t = t_thresh), roll=-Inf, .(x.variable, x.t, x.obs)]
}]
# ID max.variable max.t max.obs keep my.variable my.t my.obs
# 1: 2 2 13 e TRUE 3 11 i
# 2: 1 2 11 d TRUE 2 11 d
# 3: 3 1 0 c FALSE NA NA NA
I would stop here, with the main data in long format dat and the ID level variables in the separate table IDDT. To filter dat to groups that should be kept: dat[IDDT[(keep), .(ID)], on=.(ID)]. See ?data.table and the other intro materials mentioned when you load the package for details on the syntax.
See ?dcast if you insist on going back to wide.
Using base R:
Drop all rows with no t-values above 10:
df1 <- df1[rowSums(df1[, grepl("^t", colnames(df1))] >10) > 0, ]
Determine the group that contains the lowest value above 10 and then retrieve values:
df1$group <- apply(df1[grepl("^t", names(df1))], 1, function(x) which(x == min(x[x > 10])))
df1 <- cbind(df1, do.call(rbind, lapply(seq_len(nrow(df1)),
function(x) setNames(df1[x, paste0(c("t", "obs"), df1$group[x])],
c("lowesttabove10", "correspondingobs")))))
> df1
ID t1 obs1 t2 obs2 t3 obs3 group lowesttabove10 correspondingobs
1 1 0 a 11 d 0 g 2 11 d
2 2 0 b 13 e 11 i 3 11 i
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With