Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ddply: how to include a character vector in result

Tags:

r

plyr

sorry, for the cryptic title i didn't find any better summary for my problem. So here's my problem: i have a dataframe and want to make diff() over groups which works fine:

 df <- data.frame (name = rep(c("a", "b", "c"), 4),
              index = rep(c("c1", "c2"), each=6),
              year = rep(c(2008:2010),4),
              value = rep(1:3, each=4))

head(df)

  name index year value

1    a    c1 2008     1
2    b    c1 2009     1
3    c    c1 2010     1

ddply(df, .(name, year), summarize,  value=diff(value))

However, I would like to include the index in my result wich i tried to do with:

ddply(df, .(name, year), summarize,  value=diff(value), index=index)

Yet this yields the error message:

length(rows) == 1 is not TRUE

Which is I guess because the index has more rows because it is not processed by diff. Is there a quick solution to my problem?

Thank you very much!

EDIT

I try to clarify my question what I want to add to the result:

Suppose the variable index above. This is a factor that ought to explain something. Yet, I cannot take diff() of it that would not make sense so I just want to pass this one without changing anything. I tried drop==FALSE wich did yield the same error messsage.

Sorr for all this confusion! Here's a very simple example:

name year  index  value
 a   2008    c1    10
 a   2009    c2    30
 a   2010    c1    40

after taking diff's acroos group 'a' this looks like:

name year index d.value 
 a   2009  c2     +20  #c2 stayed the same just the first row got intentionally dropped.
 a   2010  c1     +10

consider the unfortunate name index as something like an attribute: it can change during the years but would not make sense to take a diff()

I really really hope this gives you a clue what I want - if not I'll delete the question because I found an unelegant workaround ;) and sorry for all the inconvenience!

like image 864
Seb Avatar asked Nov 05 '22 08:11

Seb


1 Answers

I'm not entirely sure what you want, it sounded like you want to get diffs, keeping the index variable and dropping the first row of each grouping. Does this get you what you want?

doSummary = function(df) {
  values = diff(df$value)
  indexes = df$index[2:length(df)]
  data.frame(d.value=values, index=indexes)
}
ddply(df, .(name, year), doSummary)
like image 61
rory Avatar answered Nov 09 '22 07:11

rory