It's easy to grab one or more in ddply to process, but is there a way to grab the entire current row and pass that onto a function? Or to grab a set of columns determined at runtime?
Let me illustrate:
Given a dataframe like
df = data.frame(a=seq(1,20), b=seq(1,5), c= seq(5,1))
df
a b c
1 1 1 5
2 2 2 4
3 3 3 3
I could write a function to sum named columns along a row of a data frame like this:
selectiveSummer = function(row,colsToSum) {
return(sum(row[,colsToSum]))
}
It works when I call it for a row like this:
> selectiveSummer(df[1,],c('a','c'))
[1] 6
So I'd like to wrap that in an anonymous function and use it in ddply to apply it to every row in the table, something like the example below
f = function(x) { selectiveSummer(x,c('a','c')) }
#this doesn't work!
ddply(df,.(a,b,c), transform, foo=f(row))
I'd like to find a solution where the set of columns to manipulate can be determined at runtime, so if there's some way just to splat that from ddply's args and pass it into a function that takes any number of args, that works too.
Edit: To be clear, the real application driving this isn't sum, but this was an easier explanation
You can only select single rows with ddply if rows can be identified in a unique way with one or more variables. If there are identical rows ddply will cycle over data frames of multiple rows even if you use all columns (like ddply(df, names(df), f).
Why not use apply instead? Apply does iterate over individual rows.
apply(df, 1, function(x) f(as.data.frame(t(x)))))
result:
[1] 6 6 6 6 6 11 11 11 11 11 16 16 16 16 16 21 21 21 21 21
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With