Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R how to identify distance of last occurrence

Tags:

r

data.table

I want to calculate how long its been since something occurred.

Given the following, you can see that the light is on some of the time, but not all of the time. I want to normalize the data to feed it to a neural network.

library(data.table)
d<-data.table(
    date = c("6/1/2013", "6/2/2013","6/3/2013","6/4/2013"),
    light = c(TRUE,FALSE,FALSE,TRUE) 
)
d
       date light
1: 6/1/2013  TRUE
2: 6/2/2013 FALSE
3: 6/3/2013 FALSE
4: 6/4/2013  TRUE

what I'd like to calculate is another column that shows the "distance" to the last occurrence.

so for the data above: first row, since its on it should be zero second row, should be 1 third row, should be 2 fourth row, should be zero

like image 797
eAndy Avatar asked Jul 08 '13 03:07

eAndy


1 Answers

I would suggest creating a grouping column based on when there is a switch from FALSE to TRUE:

# create group column
d[c(light), group := cumsum(light)]
d[is.na(group), group:=0L]
d[, group := cumsum(group)]
d

Then simply tally by group, using cumsum and negating light:

d[, distance := cumsum(!light), by=group]

# remove the group column for cleanliness
d[, group := NULL]

Results:

d

         date light distance
1: 2013-06-01  TRUE        0
2: 2013-06-02 FALSE        1
3: 2013-06-03 FALSE        2
4: 2013-06-04  TRUE        0
5: 2013-06-05  TRUE        0
6: 2013-06-06 FALSE        1
7: 2013-06-07 FALSE        2
8: 2013-06-08  TRUE        0

I added a few rows

like image 70
Ricardo Saporta Avatar answered Sep 20 '22 23:09

Ricardo Saporta