My sample data.table looks as follows
Starting Data
library(data.table)
x <- data.table(id = as.character(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)),
time = as.character(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)),
status = c("P", "R", "R", "R", "R", "P", "P", "P", "R", "R", "P", "P", "R", "R", "R"),
balance = c(100, 90, 80, 70, 60, 320, 300, 250, 200, 100, 40, 34, 31, 29, 10),
employment = c("Y", "Y", "Y", "N", "N", "N", "N", "N", "N", "Y", "N", "Y", "Y", "Y", "Y")
)
The objective is to copy the information in the columns balance and employment if the status migrates from "P" to "R" by "id". I.e. I would like to use the data in the last period in which a id is "P" and overwrite the existing information for all next periods in which the id is "R".
Hence, the goal is to obtain this data.table
Goal
Y <- data.table(id = as.character(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)),
time = as.character(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)),
status = c("P", "R", "R", "R", "R", "P", "P", "P", "R", "R", "P", "P", "R", "R", "R"),
balance = c(100, 100, 100, 100, 100, 320, 300, 250, 250, 250, 40, 34, 34, 34, 34),
employment = c("Y", "Y", "Y", "Y", "Y", "N", "N", "N", "N", "N", "N", "Y", "Y", "Y", "Y")
)
Note that the columns time and status itselft (and of course id) are not affected.
I have tried to use seq_len
across id's, then set this column to zero if status is "R and search for the max value (by id) of this column to use as an indicator which row has to be copied. I'm sure there is a faster and better way to resolve this. Maybe even an one-liner.
If anything is unclear please let me know
To enable copying and pasting DataTable data by the CTRL+C/CTRL+V keyboard shortcuts you should use the clipboard parameter. The parameter can have the following values: "block" (also set as true);
One way to do this is to set the desired columns to NA
s when status == R
and then carry the last observation forward (LOCF), since all the id
s start with P
, I don't think you really need to do this by id
and hence improve performance. Here's a way
## Define column names you want to modify
cols <- c("balance", "employment")
## Assign `NA`s when satus == "R"
x[status == "R", (cols) := NA]
## Carry the last observation forward and update by reference
library(zoo) ## You could do this with base R too, just more writing
x[, (cols) := lapply(.SD, na.locf), .SDcols = cols]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With