I have a dataset that is structured as follows:
ID origin destination time
1 a b 2
2 b a 1
2 a c 4
3 c b 1
3 b c 3
I would like to turn this into a ID-time panel dataset like:
ID location time
1 a 1
1 b 2
1 b 3
1 b 4
2 a 1
2 a 2
2 a 3
2 c 4
3 b 1
3 b 2
3 c 3
3 c 4
So basically, I need to create the panel rows for when a subject doesn't change location, and fill in the location they are supposed to be at based on the info on origin and destinations. Is there any function in R that can do this smoothly? I'd prefer solutions using data.table or dplyr.
You could make a table with every time
for which you want to know the location
of each ID
:
newDT = DT[, CJ(ID = unique(ID), time = 1:4)]
Then put the original data in long format, inferring that
time-1
time
mDT = melt(DT, id = c("ID", "time"), value.name = "loc", variable.name = "loc_role")
mDT[loc_role == "origin", time := time - 1L]
mDT[, loc_role := NULL]
setorder(mDT, ID, time)
ID time loc
1: 1 1 a
2: 1 2 b
3: 2 0 b
4: 2 1 a
5: 2 3 a
6: 2 4 c
7: 3 0 c
8: 3 1 b
9: 3 2 b
10: 3 3 c
...and fill in the new table with a rolling update join:
newDT[, location := mDT[.SD, on=.(ID, time), roll=TRUE, x.loc]]
ID time location
1: 1 1 a
2: 1 2 b
3: 1 3 b
4: 1 4 b
5: 2 1 a
6: 2 2 a
7: 2 3 a
8: 2 4 c
9: 3 1 b
10: 3 2 b
11: 3 3 c
12: 3 4 c
(Dplyr doesn't have rolling or update joins yet, so I guess there's no analogue.)
How it works
CJ
takes the Cartesian product of some vectors, similar to expand.grid
melt
transforms to long form, keeping variables passed as id =
x[i, v := expr]
edits column v
of table x
on rows selected by i
setorder
sorts in place.SD
in j
of x[i,j]
refers to the subset of data (x
) selected by i
x[i, on=, roll=, expr]
is a rolling join, with rows selected by table i
, on=
and roll=
x.v
inside a join selects column v
from x
Regarding the last bullet, the prefix i.*
would do the same thing for columns from i
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With