Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R generating new data.table slowdown

Tags:

r

data.table

I have a data.table listing intervals during which patients were exposed and the start and stop times during which they were observed for exposures. Exposure happens over intervals during observation. I want to generate the opposite intervals in which the patients were unexposed.

The data I have are formatted like the following:

library(data.table)
DT = fread("
id t0  t  s tn
1   1  2  4 15
1   1  6  7 15
1   1 10 12 15
2   4  5  7 44
2   4  9 15 44
2   4 17 35 44")

t0 is the start time of observation, t is the start of exposure, s is the end of exposure, and tn is the end of observation. An example might be exposure to extreme UV in workers. ID 1 begins work in the first month of 2000, then in the second month is exposed to extreme UV in normal working conditions, and this stops in the fourth month as cloud cover persisted for 2 months. The "s" ending time represents the clouds in the fourth month. They persist for 2 months in "safe" conditions until the 6th month when UV rises again for another 2 months. And so on...

I want to generate the intervals in which the exposure did not happen based on these data. That is, when the participant is in the "safe" conditions. The example output would be:

id  t  s
1   1  2
1   4  6
1   7 10
1  12 15
2   4  5
2   7  9
2  15 17
2  35 44

The first step I have is to set a data.table in the following manner:

nonexp <- DT[, .(t=c(t0[1], s), s=c(t, tn[1])), by=id]

but in my dataset of over 140,000 events, this is running VERY slow. I am working in a remote compute environment that is heavily moderated, so I cannot tell whether the system is running slowly or my code is bad.

Is this code obviously suboptimal in some important ways? Is there a faster way to do this?

like image 640
AdamO Avatar asked Jan 01 '26 08:01

AdamO


1 Answers

I'd store the data so that time variables are not split over multiple columns:

# bounds table
bdDT = melt(unique(DT[, .(id, t0, tn)]), id = "id", value.name = "t")
bdDT[variable == "t0", status := "safe"]
bdDT[variable == "tn", status := "end"]
bdDT[, variable := NULL ]

# core table
treatDT = melt(DT, id="id", value.name = "t", meas = c("t", "s"))
treatDT[variable == "t", status := "treated"]
treatDT[variable == "s", status := "safe"]
treatDT[, variable := NULL ]

# stack
res = unique(rbind(treatDT, bdDT), by=c("id", "t"))
setkey(res, id, t)

The data now looks like

    id  t  status
 1:  1  1    safe
 2:  1  2 treated
 3:  1  4    safe
 4:  1  6 treated
 5:  1  7    safe
 6:  1 10 treated
 7:  1 12    safe
 8:  1 15     end
 9:  2  4    safe
10:  2  5 treated
11:  2  7    safe
12:  2  9 treated
13:  2 15    safe
14:  2 17 treated
15:  2 35    safe
16:  2 44     end

From here, if you want to browse safe spells, there's...

> res[status == "safe"][res[status != "safe"], on=.(id, t), roll=TRUE, 
  .(id, start = x.t, end = i.t)
]

   id start end
1:  1     1   2
2:  1     4   6
3:  1     7  10
4:  1    12  15
5:  2     4   5
6:  2     7   9
7:  2    15  17
8:  2    35  44

Note: if some treatment goes all the way to tn, it will not show up here as a zero-length spell.


Alternately, if you have enough RAM, a cleaner way is to expand the data...

idDT = unique(DT[, .(id, start = t0, end = tn)], by="id")
fullDT = idDT[, .(t = start:end), by=id]

fullDT[, status := "safe"]
fullDT[DT, on=.(id, t >= t, t < s), status := "treated"]

From there, you can collapse to spells for easier browsing

fullDT[, 
  .(start = first(t), end = last(t))
, by=.(id, status, g = rleid(id, status))][, !"g"][,
  end := replace(end + 1L, .N, last(end))
, by=id][]

    id  status start end
 1:  1    safe     1   2
 2:  1 treated     2   4
 3:  1    safe     4   6
 4:  1 treated     6   7
 5:  1    safe     7  10
 6:  1 treated    10  12
 7:  1    safe    12  15
 8:  2    safe     4   5
 9:  2 treated     5   7
10:  2    safe     7   9
11:  2 treated     9  15
12:  2    safe    15  17
13:  2 treated    17  35
14:  2    safe    35  44

The replace step is needed because the OP writes overlapping end and start dates.

like image 82
Frank Avatar answered Jan 03 '26 03:01

Frank



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!