I've been attempting to understand what and how plyr works through trying different variables and functions and seeing what results. So I'm more looking for an explanation of how plyr works than specific fix it answers. I've read the documentation but my newbie brain is still not getting it.
Some data and names:
mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e") ,c(1,2,3,10,20,30), c(5,10,20,20,15,10)) colnames(mydf)<-c("Model", "Class","Length", "Speed") mydf
Question 1: Summarise versus Transform Syntax
So if I Enter: ddply(mydf, .(Model), summarise, sum = Length+Length)
I get:
`Model ..1 1 a 2 2 a 4 3 b 6 4 b 20 5 c 40 6 c 60
and if I enter: ddply(mydf, .(Model), summarise, Length+Length)
I get the same result.
Now if use transform: ddply(mydf, .(Model), transform, sum = (Length+Length))
I get:
Model Class Length Speed sum 1 a e 1 5 2 2 a e 2 10 4 3 b e 3 20 6 4 b e 10 20 20 5 c e 20 15 40 6 c e 30 10 60
But if I state it like the first summarise : ddply(mydf, .(Model), transform, (Length+Length))
Model Class Length Speed 1 a e 1 5 2 a e 2 10 3 b e 3 20 4 b e 10 20 5 c e 20 15 6 c e 30 10
So why does adding "sum =" make a difference?
Question 2: Why don't these work?
ddply(mydf, .(Model), sum, Length+Length)
#Error in function (i) : object 'Length' not found
ddply(mydf, .(Model), length, mydf$Length) #Error in .fun(piece, ...) :
2 arguments passed to 'length' which requires 1
These examples are more to show that somewhere I'm fundamentally not understanding how to use plyr.
Any anwsers or explanations are appreciated.
I find that when I'm having trouble "visualizing" how any of the functional tools in R work, that the easiest thing to do is browser a single instance:
ddply(mydf, .(Model), function(x) browser() )
Then inspect x
in real-time and it should all make sense. You can then test out your function on x, and if it works you're golden (barring other groupings being different than your first x).
The syntax is:
ddply(data.frame, variable(s), function, optional arguments)
where the function is expected to return a data.frame
. In your situation,
summarise is a function that will transparently create a new data.frame, with the results of the expression that you provide as further arguments (...)
transform, a base R function, will transform the data.frames (first split by the variable(s)), adding new columns according to the expression(s) that you provide as further arguments. These need to be named, that's just the way transform works.
If you use other functions than subset, transform, mutate, with, within, or summarise, you'll need to make sure they return a data.frame (length and sum don't), or at the very least a vector of appropriate length for the output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With