I have working R code, but it is inelegant and inefficient. I am wondering if there is a better way: i.e. how can I vectorize this process and/or reduce the computing time?
library(data.table)
dt <- data.table(
visited_a = c(1, 1, 0, 0),
visited_b = c(1, 0, 0, 0),
visited_c = c(0, 0, 1, 1),
purchased = c("b", "b", "c", "a")
)
My data.table has dummy indicators for whether a consumer visited a store in 2017. So visited_a = 0
means she did not visit store a
in 2017 while visited_b = 1
means she did visit store b
in 2017. The data also list which store the consumer purchased from in 2018; all of these consumers made a purchase. Thus a consumer may or may not have visited the store (last year) that she purchased from (this year).
I want to add a variable purchased_was_visited
to capture this. The solution would be:
dt$purchased_was_visited <- c(1, 0, 1, 0)
Here is my extraordinary inelegant code that sadly loops through the data.table one row at a time. There must be a better way!
dt[ , purchased_was_visited := NA]
for(i in 1:nrow(dt)) {
brand <- dt[i, purchased]
col <- paste0("visited_", brand)
was_it <- dt[i, ..col]
dt[i, purchased_was_visited := was_it]
}
I would give your consumers an ID column and store the data in two tables:
dt[, cid := .I]
# visits
vDT = melt(dt, id="cid", meas=patterns("visited"), variable.name = "store")[value == 1, !"value"]
vDT[, store := tstrsplit(store, "_")[[2]]]
vDT[, year := 2017L]
# choices
cDT = dt[, .(cid, year = 2018L, store = purchased)]
Then you can do a join to add the indicator column to cDT:
cDT[, v_before := vDT[.SD, on=.(cid, store, year < year), .N, by=.EACHI]$N]
cid year store v_before
1: 1 2018 b 1
2: 2 2018 b 0
3: 3 2018 c 1
4: 4 2018 a 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With