Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table binary value for last row in group by condition

Tags:

r

data.table

I have data like this:

library(data.table)
id <- c("1232","1232","1232","4211","4211","4211")
conversion <- c(0,0,0,1,1,1)
DT <- data.table(id, conversion)

id   date         conversion
1232 2018-01-01   0
1232 2018-01-03   0
1232 2018-01-04   0
4211 2018-04-01   1
4211 2018-04-04   1
4211 2018-04-06   1

I would like to create a binary value for only the last row of each group based on the id row. The binary would 1 only when conversion is 1 for the group.

id   date         conversion  lastconv
1232 2018-01-01   0           0
1232 2018-01-03   0           0 
1232 2018-01-04   0           0
4211 2018-04-01   1           0
4211 2018-04-04   1           0
4211 2018-04-06   1           1

I've tried using a few examples with the "mult" parameter in data.table, but have only returned errors.

DT[unique(id), lastconv := 1, mult = "last"]
like image 701
ericbrownaustin Avatar asked May 10 '19 22:05

ericbrownaustin


2 Answers

Modifying the OP's code to join on the last row of each group:

DT[, v := 0]
DT[.(DT[conversion == 1, unique(id)]), on=.(id), mult="last", v := 1]

     id conversion v
1: 1232          0 0
2: 1232          0 0
3: 1232          0 0
4: 4211          1 0
5: 4211          1 0
6: 4211          1 1

This is only different in that it selects which ids to edit based on the desired condition.

like image 200
Frank Avatar answered Oct 19 '22 17:10

Frank


For each id, check if row number is the last row number in the group, and if 'conversion' is 1. Convert logical result to integer.

DT[ , lastconv := as.integer(.I == .I[.N] & conversion == 1), by = id]
like image 24
Henrik Avatar answered Oct 19 '22 18:10

Henrik