New columns based off existing column and column located next to it

Question

My dataframe looks like this

ID t1 obs1 t2 obs2 t3 obs3
1  0  a    11 d    0  g
2  0  b    13 e    11 i
3  0  c    0  f    0  h

I need to make sure each ID has at least one t above 10 (delete row if not). Then, I want to save the lowest t value above 10, but also save the corresponding obs in new columns. (The complicated part about my question is that the lowest t above 10 could be in any column). The corresponding obs to some t is located in the next column, so that helps. So my resulting data frame would look like this:

ID t1 obs1 t2 obs2 t3 obs3 lowesttabove10 correspondingobs
1  0  a    11 d    0  g    11             d
2  0  b    13 e    11 i    11             i

Frank · Accepted Answer

With data.table, go to long format:

library(data.table)
setDT(DT)
dat = melt(DT, measure.vars = patterns("^t\d+$", "^obs\d+$"), value.name = c("t", "obs"))
setorder(dat, ID, variable)

#    ID variable  t obs
# 1:  1        1  0   a
# 2:  1        2 11   d
# 3:  1        3  0   g
# 4:  2        1  0   b
# 5:  2        2 13   e
# 6:  2        3 11   i
# 7:  3        1  0   c
# 8:  3        2  0   f
# 9:  3        3  0   h

Find max value per group and mark groups to keep:

IDDT = dat[order(-t), 
  .(max.variable = first(variable), max.t = first(t), max.obs = first(obs))
, by=ID]
IDDT[, keep := max.t > 10]

#    ID max.variable max.t max.obs  keep
# 1:  2            2    13       e  TRUE
# 2:  1            2    11       d  TRUE
# 3:  3            1     0       c FALSE

Find min value over 10 per kept group using a rolling update join:

IDDT[(keep), c("my.variable", "my.t", "my.obs") := {
  m = .(ID = ID, t_thresh = 10)
  dat[m, on=.(ID, t = t_thresh), roll=-Inf, .(x.variable, x.t, x.obs)]
}]

#    ID max.variable max.t max.obs  keep my.variable my.t my.obs
# 1:  2            2    13       e  TRUE           3   11      i
# 2:  1            2    11       d  TRUE           2   11      d
# 3:  3            1     0       c FALSE          NA   NA     NA

I would stop here, with the main data in long format dat and the ID level variables in the separate table IDDT. To filter dat to groups that should be kept: dat[IDDT[(keep), .(ID)], on=.(ID)]. See ?data.table and the other intro materials mentioned when you load the package for details on the syntax.

See ?dcast if you insist on going back to wide.

manotheshark · Answer

Using base R:

Drop all rows with no t-values above 10:

df1 <- df1[rowSums(df1[, grepl("^t", colnames(df1))] >10) > 0, ]

Determine the group that contains the lowest value above 10 and then retrieve values:

df1$group <- apply(df1[grepl("^t", names(df1))], 1, function(x) which(x == min(x[x > 10])))
df1 <- cbind(df1, do.call(rbind, lapply(seq_len(nrow(df1)), 
                                        function(x) setNames(df1[x, paste0(c("t", "obs"), df1$group[x])],
                                                             c("lowesttabove10", "correspondingobs")))))

> df1
  ID t1 obs1 t2 obs2 t3 obs3 group lowesttabove10 correspondingobs
1  1  0    a 11    d  0    g     2             11                d
2  2  0    b 13    e 11    i     3             11                i

New columns based off existing column and column located next to it

Tags:

r

dplyr

melt

rdk

2 Answers

Frank

manotheshark

Recent Activity

Donate For Us

New columns based off existing column and column located next to it

Tags:

r

dplyr

melt

rdk

2 Answers

Frank

manotheshark

Related questions

Recent Activity

Donate For Us