Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New columns based off existing column and column located next to it

Tags:

r

dplyr

melt

My dataframe looks like this

ID t1 obs1 t2 obs2 t3 obs3
1  0  a    11 d    0  g
2  0  b    13 e    11 i
3  0  c    0  f    0  h

I need to make sure each ID has at least one t above 10 (delete row if not). Then, I want to save the lowest t value above 10, but also save the corresponding obs in new columns. (The complicated part about my question is that the lowest t above 10 could be in any column). The corresponding obs to some t is located in the next column, so that helps. So my resulting data frame would look like this:

ID t1 obs1 t2 obs2 t3 obs3 lowesttabove10 correspondingobs
1  0  a    11 d    0  g    11             d
2  0  b    13 e    11 i    11             i
like image 689
rdk Avatar asked Dec 05 '25 03:12

rdk


2 Answers

With data.table, go to long format:

library(data.table)
setDT(DT)
dat = melt(DT, measure.vars = patterns("^t\\d+$", "^obs\\d+$"), value.name = c("t", "obs"))
setorder(dat, ID, variable)

#    ID variable  t obs
# 1:  1        1  0   a
# 2:  1        2 11   d
# 3:  1        3  0   g
# 4:  2        1  0   b
# 5:  2        2 13   e
# 6:  2        3 11   i
# 7:  3        1  0   c
# 8:  3        2  0   f
# 9:  3        3  0   h

Find max value per group and mark groups to keep:

IDDT = dat[order(-t), 
  .(max.variable = first(variable), max.t = first(t), max.obs = first(obs))
, by=ID]
IDDT[, keep := max.t > 10]

#    ID max.variable max.t max.obs  keep
# 1:  2            2    13       e  TRUE
# 2:  1            2    11       d  TRUE
# 3:  3            1     0       c FALSE

Find min value over 10 per kept group using a rolling update join:

IDDT[(keep), c("my.variable", "my.t", "my.obs") := {
  m = .(ID = ID, t_thresh = 10)
  dat[m, on=.(ID, t = t_thresh), roll=-Inf, .(x.variable, x.t, x.obs)]
}]

#    ID max.variable max.t max.obs  keep my.variable my.t my.obs
# 1:  2            2    13       e  TRUE           3   11      i
# 2:  1            2    11       d  TRUE           2   11      d
# 3:  3            1     0       c FALSE          NA   NA     NA

I would stop here, with the main data in long format dat and the ID level variables in the separate table IDDT. To filter dat to groups that should be kept: dat[IDDT[(keep), .(ID)], on=.(ID)]. See ?data.table and the other intro materials mentioned when you load the package for details on the syntax.

See ?dcast if you insist on going back to wide.

like image 197
Frank Avatar answered Dec 07 '25 19:12

Frank


Using base R:

Drop all rows with no t-values above 10:

df1 <- df1[rowSums(df1[, grepl("^t", colnames(df1))] >10) > 0, ]

Determine the group that contains the lowest value above 10 and then retrieve values:

df1$group <- apply(df1[grepl("^t", names(df1))], 1, function(x) which(x == min(x[x > 10])))
df1 <- cbind(df1, do.call(rbind, lapply(seq_len(nrow(df1)), 
                                        function(x) setNames(df1[x, paste0(c("t", "obs"), df1$group[x])],
                                                             c("lowesttabove10", "correspondingobs")))))

> df1
  ID t1 obs1 t2 obs2 t3 obs3 group lowesttabove10 correspondingobs
1  1  0    a 11    d  0    g     2             11                d
2  2  0    b 13    e 11    i     3             11                i
like image 45
manotheshark Avatar answered Dec 07 '25 19:12

manotheshark



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!