How to compare one column to a series of related dummy variables without a for loop in R

Question

I have working R code, but it is inelegant and inefficient. I am wondering if there is a better way: i.e. how can I vectorize this process and/or reduce the computing time?

library(data.table)
dt <- data.table(
    visited_a = c(1, 1, 0, 0),
    visited_b = c(1, 0, 0, 0),
    visited_c = c(0, 0, 1, 1),
    purchased = c("b", "b", "c", "a")
)

My data.table has dummy indicators for whether a consumer visited a store in 2017. So visited_a = 0 means she did not visit store a in 2017 while visited_b = 1 means she did visit store b in 2017. The data also list which store the consumer purchased from in 2018; all of these consumers made a purchase. Thus a consumer may or may not have visited the store (last year) that she purchased from (this year).

I want to add a variable purchased_was_visited to capture this. The solution would be:

dt$purchased_was_visited <- c(1, 0, 1, 0)

Here is my extraordinary inelegant code that sadly loops through the data.table one row at a time. There must be a better way!

dt[ , purchased_was_visited := NA]
for(i in 1:nrow(dt)) {
    brand <- dt[i, purchased]
    col <- paste0("visited_", brand)
    was_it <- dt[i, ..col]
    dt[i, purchased_was_visited := was_it]
}

Frank · Accepted Answer

I would give your consumers an ID column and store the data in two tables:

dt[, cid := .I]

# visits
vDT = melt(dt, id="cid", meas=patterns("visited"), variable.name = "store")[value == 1, !"value"]
vDT[, store := tstrsplit(store, "_")[[2]]]
vDT[, year := 2017L]

# choices
cDT = dt[, .(cid, year = 2018L, store = purchased)]

Then you can do a join to add the indicator column to cDT:

cDT[, v_before := vDT[.SD, on=.(cid, store, year < year), .N, by=.EACHI]$N]

   cid year store v_before
1:   1 2018     b        1
2:   2 2018     b        0
3:   3 2018     c        1
4:   4 2018     a        0

How to compare one column to a series of related dummy variables without a for loop in R

Tags:

for-loop

r

data.table

DanY

1 Answers

Frank

Recent Activity

Donate For Us

How to compare one column to a series of related dummy variables without a for loop in R

Tags:

for-loop

r

data.table

DanY

1 Answers

Frank

Related questions

Recent Activity

Donate For Us