I am interested in finding a way to vectorize (using ddply or some other apply function) the following:
day = seq(0,100,20)
d = data.frame(id=rep(seq(1:10),each=length(day)))
d$s = rnorm(nrow(d),0,1)
d$diffS = NA
for(i in unique(d$id)) {
d$diffS[d$id==i] = c(0,diff(d$s[d$id==i]))
}
Essentially I am looking for a more clever way of taking a subset of a dataframe by ID, apply a function that returns a vector and add it back the the dataframe. I thought maybe the "by" function would work, but I can't figure it out.
You can try one of the aggregating
functions
d$diffS <- with(d, ave(s, id, FUN=function(x) c(0, diff(x))))
Or
library(dplyr)
d %>%
group_by(id) %>%
mutate(diffS= c(0, diff(s)))
Or
library(data.table)#v1.9.5+
setDT(d)[, diffS:= c(0, diff(s)), by = id]
As @Arun mentioned in the comments, the devel version of 'data.table' has shift
which would be more efficient. Instructions to install the devel version are here
setDT(d)[, diffS := s-shift(s, fill=0), by = id]
This could also be achieved by the following
Using ddply
library('plyr')
out = ddply(d, .(id), mutate, diffs = c(0,diff(s)))
Or tapply
d$diffs = unlist(tapply(d$s, d$id, function(x) c(0, diff(x))))
Or lapply
out = do.call(rbind,
lapply(split(d, f = d$id),
function(x){x$diffs = c(0,diff(x$s)); x}))
Or sapply
library('reshape')
d$diffs = melt(sapply(split(d, d$id), function(x) c(0, diff(x$s))))$value
Because you mentioned the function by
:
using_by <- with(d, by(s, id, FUN=function(x) c(0, diff(x))))
It isn't recommended because of the output layout. It isn't conducive to attachment with a data frame,
id: 1
[1] 0.0000000 1.7884528 0.8135887 0.1891395 -0.6823383
[6] -2.6844915
---------------------------------------------
id: 2
[1] 0.0000000 -0.0258939 -0.8095359 0.5238898 -1.0345254
[6] 1.5432667
To fix it to the data, an extra step should be taken:
d$diffS <- unname(unlist(using_by))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With