Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create a panel dataset out of transition date data in R?

I have a dataset that is structured as follows:

ID   origin   destination  time
1     a        b           2
2     b        a           1
2     a        c           4
3     c        b           1
3     b        c           3

I would like to turn this into a ID-time panel dataset like:

ID   location  time
1     a        1
1     b        2
1     b        3
1     b        4
2     a        1
2     a        2
2     a        3
2     c        4
3     b        1
3     b        2
3     c        3
3     c        4

So basically, I need to create the panel rows for when a subject doesn't change location, and fill in the location they are supposed to be at based on the info on origin and destinations. Is there any function in R that can do this smoothly? I'd prefer solutions using data.table or dplyr.

like image 969
Kenji Avatar asked Dec 23 '22 07:12

Kenji


1 Answers

You could make a table with every time for which you want to know the location of each ID:

newDT = DT[, CJ(ID = unique(ID), time = 1:4)]

Then put the original data in long format, inferring that

  • origin holds for time-1
  • destination holds for time
mDT = melt(DT, id = c("ID", "time"), value.name = "loc", variable.name = "loc_role")
mDT[loc_role == "origin", time := time - 1L]
mDT[, loc_role := NULL]
setorder(mDT, ID, time)

    ID time loc
 1:  1    1   a
 2:  1    2   b
 3:  2    0   b
 4:  2    1   a
 5:  2    3   a
 6:  2    4   c
 7:  3    0   c
 8:  3    1   b
 9:  3    2   b
10:  3    3   c

...and fill in the new table with a rolling update join:

newDT[, location := mDT[.SD, on=.(ID, time), roll=TRUE, x.loc]]

    ID time location
 1:  1    1        a
 2:  1    2        b
 3:  1    3        b
 4:  1    4        b
 5:  2    1        a
 6:  2    2        a
 7:  2    3        a
 8:  2    4        c
 9:  3    1        b
10:  3    2        b
11:  3    3        c
12:  3    4        c

(Dplyr doesn't have rolling or update joins yet, so I guess there's no analogue.)

How it works

  • CJ takes the Cartesian product of some vectors, similar to expand.grid
  • melt transforms to long form, keeping variables passed as id =
  • x[i, v := expr] edits column v of table x on rows selected by i
  • setorder sorts in place
  • .SD in j of x[i,j] refers to the subset of data (x) selected by i
  • x[i, on=, roll=, expr] is a rolling join, with rows selected by table i, on= and roll=
  • The expression x.v inside a join selects column v from x

Regarding the last bullet, the prefix i.* would do the same thing for columns from i.

like image 80
Frank Avatar answered Dec 25 '22 21:12

Frank