Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a time to and time after event variables

I am working on panel data that looks like this:

d <- data.frame(id = c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c"),
                time = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
                iz = c(0,1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1))
   id time iz
1   a    1  0
2   a    2  1
3   a    3  1
4   a    4  0
5   a    5  0
6   b    1  0
7   b    2  0
8   b    3  0
9   b    4  0
10  b    5  1
11  c    1  0
12  c    2  0
13  c    3  0
14  c    4  1
15  c    5  1

Here iz is an indicator for an event or a treatment (iz = 1). What I need is a variable that counts the periods before and after an event or the distance to and from an event. This variable would look like this:

  id time iz nvar
1   a    1  0   -1
2   a    2  1    0
3   a    3  1    0
4   a    4  0    1
5   a    5  0    2
6   b    1  0   -4
7   b    2  0   -3
8   b    3  0   -2
9   b    4  0   -1
10  b    5  1    0
11  c    1  0   -1
12  c    2  0   -2
13  c    3  0   -3
14  c    4  1    0
15  c    5  1    0

I have tried working with the answers given here and here but can't make it work in my case.

I would really appreciate any ideas how to approach this problem. Thank you in advance for all ideas and suggestions.

like image 848
Niklas Avatar asked Jan 29 '21 12:01

Niklas


People also ask

What is event in variable?

Event variables are local to the event and can be found on some page and object events. When we say they are "local," we mean their values are only available for actions created on that event. Note: You can find additional information about each event variable by looking at the list of events.


Video Answer


2 Answers

1) rleid This code applies rleid from data.table to each id and then generates a negative reverse sequence if that produces a run of 1's and a forward sequence otherwise, i.e. we assume that a forward positive sequence should be used except before the first run of ones. For the 1's in iz zero that out. There can be any number of runs in an id and it also supports id's with only 0's or only 1's. It assumes that time has no gaps.

library(data.table)

Seq <- function(x, s = seq_along(x)) if (x[1] == 1) -rev(s) else s
nvar <- function(iz, r = rleid(iz)) ave((1-iz) * r, r, FUN = Seq)
transform(d, nvar = (1-iz) * ave(iz, id, FUN = nvar))

giving:

   id time iz nvar
1   a    1  0   -1
2   a    2  1    0
3   a    3  1    0
4   a    4  0    1
5   a    5  0    2
6   b    1  0   -4
7   b    2  0   -3
8   b    3  0   -2
9   b    4  0   -1
10  b    5  1    0
11  c    1  0   -3
12  c    2  0   -2
13  c    3  0   -1
14  c    4  1    0
15  c    5  1    0

2) base This code uses only base R. It assumes that every id has at most one run of ones. There is no restriction on whether there are any zeros. Also it supports gaps in time. It applies nvar to the row numbers of each id. First it calculates the range rng of the times of the ones and then calculates the signed distance in the last line of nvar. The output is identical to that shown in (1). If we could assume that every id has exactly one run of 1's the if statement could be omitted.

nvar <- function(ix) with(d[ix, ], {
  if (all(iz == 0)) return(iz)
  rng <- range(time[iz == 1])
  (time < rng[1]) * (time - rng[1]) + (time > rng[2]) * (time - rng[2])
})
transform(d, nvar = ave(1:nrow(d), id, FUN = nvar))

2a) This variation of (2) passes time and iz to nvar by encoding them as the real and imaginary parts of a complex vector in order to avoid having to deal with row numbers but it is otherwise the same as (2). We have omitted the if statement in (2) but it could be added back in if any id's have no ones.

nvar <- function(x, time = Re(x), iz = Im(x), rng = range(time[iz == 1])) 
  (time < rng[1]) * (time - rng[1]) + (time > rng[2]) * (time - rng[2])
transform(d, nvar = Re(ave(time + iz * 1i, id, FUN = nvar)))
like image 168
G. Grothendieck Avatar answered Oct 02 '22 14:10

G. Grothendieck


Here is a solution that is a (tiny) bit more complex than the one from G.Grothendieck. But is will be able to handle non-sequential times.

library( data.table )
#make d a data.table
setDT(d)

#you can remove the trailing [], they are just for passing the output to the console...
#nvar = 0 where iz = 1
d[ iz == 1, nvar := 0 ][]
#calculate nvar for iz == 0 BEFORE iz == 1, using a forward rolling join
#create subsets for redability
d1 <- d[ iz == 1, ]
d0 <- d[ iz == 0, ]
d[ iz == 0, nvar := time - d1[ d0, x.time, on = .(id, time), roll = -Inf ] ][]
#calculate nvar for iz == 0 AFTER iz == 1, usning a backward rolling join
#create subsets for redability
d1 <- d[ iz == 1, ]
d0 <- d[ iz == 0 & is.na( nvar ), ]
d[ iz == 0 & is.na(nvar) , nvar := time - d1[ d0, x.time, on = .(id, time), roll = Inf ] ][]

#     id time iz nvar
#  1:  a    1  0   -1
#  2:  a    2  1    0
#  3:  a    3  1    0
#  4:  a    4  0    1
#  5:  a    5  0    2
#  6:  b    1  0   -4
#  7:  b    2  0   -3
#  8:  b    3  0   -2
#  9:  b    4  0   -1
# 10:  b    5  1    0
# 11:  c    1  0   -3
# 12:  c    2  0   -2
# 13:  c    3  0   -1
# 14:  c    4  1    0
# 15:  c    5  1    0
like image 34
Wimpel Avatar answered Oct 02 '22 12:10

Wimpel