Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to vectorize a "for" loop that returns a vector after applying a function for each ID

Tags:

r

I am interested in finding a way to vectorize (using ddply or some other apply function) the following:

day = seq(0,100,20)
d = data.frame(id=rep(seq(1:10),each=length(day)))
d$s = rnorm(nrow(d),0,1)
d$diffS = NA
for(i in unique(d$id)) {
  d$diffS[d$id==i] = c(0,diff(d$s[d$id==i]))
}

Essentially I am looking for a more clever way of taking a subset of a dataframe by ID, apply a function that returns a vector and add it back the the dataframe. I thought maybe the "by" function would work, but I can't figure it out.

like image 355
David S Avatar asked Jun 08 '15 18:06

David S


3 Answers

You can try one of the aggregating functions

d$diffS <- with(d, ave(s, id, FUN=function(x) c(0, diff(x))))

Or

library(dplyr)
d %>% 
   group_by(id) %>%
   mutate(diffS= c(0, diff(s)))

Or

library(data.table)#v1.9.5+
setDT(d)[, diffS:= c(0, diff(s)), by = id]

As @Arun mentioned in the comments, the devel version of 'data.table' has shift which would be more efficient. Instructions to install the devel version are here

setDT(d)[, diffS := s-shift(s, fill=0), by = id]
like image 112
akrun Avatar answered Nov 07 '22 15:11

akrun


This could also be achieved by the following

Using ddply

library('plyr')
out = ddply(d, .(id), mutate, diffs = c(0,diff(s)))

Or tapply

d$diffs = unlist(tapply(d$s, d$id, function(x) c(0, diff(x))))

Or lapply

out = do.call(rbind, 
      lapply(split(d, f = d$id), 
      function(x){x$diffs = c(0,diff(x$s)); x}))

Or sapply

library('reshape')
d$diffs = melt(sapply(split(d, d$id), function(x) c(0, diff(x$s))))$value
like image 24
Veerendra Gadekar Avatar answered Nov 07 '22 15:11

Veerendra Gadekar


Because you mentioned the function by:

using_by <- with(d, by(s, id, FUN=function(x) c(0, diff(x))))

It isn't recommended because of the output layout. It isn't conducive to attachment with a data frame,

id: 1
[1]  0.0000000  1.7884528  0.8135887  0.1891395 -0.6823383
[6] -2.6844915
--------------------------------------------- 
id: 2
[1]  0.0000000 -0.0258939 -0.8095359  0.5238898 -1.0345254
[6]  1.5432667

To fix it to the data, an extra step should be taken:

d$diffS <- unname(unlist(using_by))
like image 37
Pierre L Avatar answered Nov 07 '22 16:11

Pierre L