Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate differences between rows faster than a for loop?

Tags:

r

I have a data set that looks like this:

ID   |   DATE    | SCORE
-------------------------
123  |  1/15/10  |  10
123  |  1/1/10   |  15
124  |  3/5/10   |  20
124  |  1/5/10   |  30
...

So to load the above snippet as a data frame, the code is:

id<-c(123,123,124,124)
date<-as.Date(c('2010-01-15','2010-01-01','2010-03-05','2010-01-05'))
score<-c(10,15,20,30)
data<-data.frame(id,date,score)


I'm trying to add a column that calculates the "days since last record for this ID".

Right now I'm using a FOR loop that looks something like this:

data$dayssincelast <- rep(NA, nrow(data))
for(i in 2:nrow(data)) {
  if(data$id[i] == data$id[i-1]) 
    data$dayssincelast[i] <- data$date[i] - data$date[i-1]
}


Is there a faster way to do this? (I've looked a bit into APPLY but can't quite figure out a solution besides a FOR loop.)

Thanks in advance!

like image 884
Dave Guarino Avatar asked Nov 27 '12 19:11

Dave Guarino


1 Answers

This should work if your the dates are in order within id.

id<-c(123,123,124,124)
date<-as.Date(c('2010-01-15','2010-01-01','2010-03-05','2010-01-05'))
score<-c(10,15,20,30)
data<-data.frame(id,date,score)

data <- data[order(data$id,data$date),]
data$dayssincelast<-do.call(c,by(data$date,data$id,function(x) c(NA,diff(x))))
# Or, even more concisely
data$dayssincelast<-unlist(by(data$date,data$id,function(x) c(NA,diff(x))))
like image 185
nograpes Avatar answered Sep 22 '22 14:09

nograpes