I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables.
new.data <- ddply(old.data,
c("factor", "factor2"),
function(df)
c(a11_a10 = CustomFunction(df$a11_a10),
a12_a11 = CustomFunction(df$a12_a11),
a13_a12 = CustomFunction(df$a13_a12),
...
...
...))
Is there a way for me to insert a loop in ddply so that I can avoid writing each new summary variable out, e.g.
for (i in 11:n) {
paste("a", i, "_a", i - 1) = CustomFunction(..... )
}
I know that this is not how it would actually be done, but I just wanted to show how I'd conceptualize it. Is there a way to do this in the function I call in ddply, or via a list?
UPDATE: Because I'm a new user, I can't post an answer to my own question:
My answer involves ideas from Nick's answer and Ista's comment:
func <- function(old.data, min, max, gap) {
varrange <- min:max
usenames <- paste("a", varrange, "_a", varrange - gap, sep="")
new.data <- ddply(old.data,
.(factor, factor2),
colwise(CustomFunction, c(usenames)))
}
Building on the excellent answer by @Nick, here is one approach to the problem
foo <- function(df){
names = paste("a", 11:n, "_a", 10:(n-1), sep = "")
results = sapply(df[,names], CustomFunction)
}
new.data = ldply(dlply(old.data, c("factor", "factor2")), foo)
Here is an example application using the tips
dataset in ggplot2
. Suppose we want to calculate the average of tip
and total_bill
by combination of sex
and smoker
, here is how the code would work
foo = function(df){names = c("tip", "total_bill"); sapply(df[,names], mean)}
new = ldply(dlply(tips, c("sex", "smoker")), foo)
It produces the output shown below
.id tip total_bill
1 Female.No 2.773519 18.10519
2 Female.Yes 2.931515 17.97788
3 Male.No 3.113402 19.79124
4 Male.Yes 3.051167 22.28450
Is this what you were looking for?
If I understand you correctly, you essentially want to apply a custom function to every column in the ddply
data.frame.
The good news is there is a ddply
function that does exactly that. This means the solution to your problem boils down to a one liner:
Building on the excellent example of @Ramnath:
library(ggplot2)
customfunction <- mean
ddply(tips, .(sex, smoker), numcolwise(customfunction))
sex smoker total_bill tip size
1 Female No 18.10519 2.773519 2.592593
2 Female Yes 17.97788 2.931515 2.242424
3 Male No 19.79124 3.113402 2.711340
4 Male Yes 22.28450 3.051167 2.500000
The reason this works is that colwise
turns a function that works on a vector into a function that works on a column in a data.frame. There are two variants of colwise
: numcolwise
works only on numeric columns, and catcolwise
works on categorical columns. See?colwise
for more information.
EDIT:
I appreciate that you may not wish to apply the function to all columns in your data.frame. Still, I find this syntax so easy, that my general approach would be to modify the data.frame that I pass to ddply
. For example, the following modified example subsets tips
to exclude some columns. The solution is still a one-liner:
ddply(tips[, -2], .(sex, smoker), numcolwise(customfunction))
sex smoker total_bill size
1 Female No 18.10519 2.592593
2 Female Yes 17.97788 2.242424
3 Male No 19.79124 2.711340
4 Male Yes 22.28450 2.500000
In steps:
varrange<-11:n
usenames<-paste("a", varrange, "_a", varrange - 1, sep="")
results<-sapply(usenames, function(curname){CustomFunction(df[,curname])})
names(results)<-usenames
Is this what you want?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With