I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset.
I wrote a function to subset a data frame as follows:
samp <- function(dataf)
{
dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),]
}
Now I want to apply this function to each species in a larger data frame.
When I try something like
culled_data = ddply (larger_data, .(species), subset, samp)
I get this error:
Error in subset.data.frame(piece, ...) :
'subset' must evaluate to logical
Anyone got ideas on how to do this?
Today we will emphasize ddply() which accepts a data. frame, splits it into pieces based on one or more factors, computes on the pieces, then returns the results as a data.
ddply: Split data frame, apply function, and return results in a data frame.
It looks like it should work once you remove , subset
from your call.
Dirk answer is of course correct, but to add additional explanation I post my own.
First of all your syntax is a shorthand. It's equivalent of
ddply(larger_data, .(species), function(dfrm) subset(dfrm, samp))
so you can clearly see that you provide function
(see class(samp)
) as second argument of subset
. You could use samp(dfrm)
, but it won't work too cause samp
return data.frame
and subset
need logical vector. So you could use samp(dfrm)
when it returns logical indexing.
Make subset
work by feed him with logical vector:
ddply (larger_data, .(species), subset, sample(seq_along(species)<=40))
I create logical vector with 40 TRUE
(btw it works when for some spieces is less then 40 cases, then it return all) and random it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With