Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sub sample data by group using ddply?

Tags:

r

plyr

I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset.

I wrote a function to subset a data frame as follows:

    samp <- function(dataf)
{
    dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),]
}

Now I want to apply this function to each species in a larger data frame.

When I try something like

culled_data = ddply (larger_data, .(species), subset, samp)

I get this error:

Error in subset.data.frame(piece, ...) : 
  'subset' must evaluate to logical

Anyone got ideas on how to do this?

like image 468
Maiasaura Avatar asked May 27 '10 16:05

Maiasaura


People also ask

How does ddply work?

Today we will emphasize ddply() which accepts a data. frame, splits it into pieces based on one or more factors, computes on the pieces, then returns the results as a data.

What does ddply function do in r?

ddply: Split data frame, apply function, and return results in a data frame.


2 Answers

It looks like it should work once you remove , subset from your call.

like image 187
Dirk Eddelbuettel Avatar answered Oct 20 '22 17:10

Dirk Eddelbuettel


Dirk answer is of course correct, but to add additional explanation I post my own.

Why your call don't work?

First of all your syntax is a shorthand. It's equivalent of

ddply(larger_data, .(species), function(dfrm) subset(dfrm, samp))

so you can clearly see that you provide function (see class(samp)) as second argument of subset. You could use samp(dfrm), but it won't work too cause samp return data.frame and subset need logical vector. So you could use samp(dfrm) when it returns logical indexing.

How to use subset in this case?

Make subset work by feed him with logical vector:

ddply (larger_data, .(species), subset, sample(seq_along(species)<=40))

I create logical vector with 40 TRUE (btw it works when for some spieces is less then 40 cases, then it return all) and random it.

like image 26
Marek Avatar answered Oct 20 '22 16:10

Marek