I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset. I wrote a function to subset a data frame as follows: <pre class="prettyprint"><code> samp <- function(dataf) { dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),] } </code></pre> Now I want to apply this function to each species in a larger data frame. When I try something like <pre class="prettyprint"><code>culled_data = ddply (larger_data, .(species), subset, samp) </code></pre> I get this error: <pre class="prettyprint"><code>Error in subset.data.frame(piece, ...) : 'subset' must evaluate to logical </code></pre> Anyone got ideas on how to do this?

It looks like it should work once you remove <code>, subset</code> from your call.

Dirk answer is of course correct, but to add additional explanation I post my own. <h3>Why your call don't work?</h3> First of all your syntax is a shorthand. It's equivalent of <pre class="prettyprint"><code>ddply(larger_data, .(species), function(dfrm) subset(dfrm, samp)) </code></pre> so you can clearly see that you provide <code>function</code> (see <code>class(samp)</code>) as second argument of <code>subset</code>. You could use <code>samp(dfrm)</code>, but it won't work too cause <code>samp</code> return <code>data.frame</code> and <code>subset</code> need logical vector. So you could use <code>samp(dfrm)</code> when it returns logical indexing. <h3>How to use subset in this case?</h3> Make <code>subset</code> work by feed him with logical vector: <pre class="prettyprint"><code>ddply (larger_data, .(species), subset, sample(seq_along(species)<=40)) </code></pre> I create logical vector with 40 <code>TRUE</code> (btw it works when for some spieces is less then 40 cases, then it return all) and random it.

How do I sub sample data by group using ddply?

Tags:

r

plyr

I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset.

I wrote a function to subset a data frame as follows:

    samp <- function(dataf)
{
    dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),]
}

Now I want to apply this function to each species in a larger data frame.

When I try something like

culled_data = ddply (larger_data, .(species), subset, samp)

I get this error:

Error in subset.data.frame(piece, ...) : 
  'subset' must evaluate to logical

Anyone got ideas on how to do this?

468

asked May 27 '10 16:05

Maiasaura

2 Answers

It looks like it should work once you remove , subset from your call.

187

answered Oct 20 '22 17:10

Dirk Eddelbuettel

Dirk answer is of course correct, but to add additional explanation I post my own.

Why your call don't work?

First of all your syntax is a shorthand. It's equivalent of

ddply(larger_data, .(species), function(dfrm) subset(dfrm, samp))

so you can clearly see that you provide function (see class(samp)) as second argument of subset. You could use samp(dfrm), but it won't work too cause samp return data.frame and subset need logical vector. So you could use samp(dfrm) when it returns logical indexing.

How to use subset in this case?

Make subset work by feed him with logical vector:

ddply (larger_data, .(species), subset, sample(seq_along(species)<=40))

I create logical vector with 40 TRUE (btw it works when for some spieces is less then 40 cases, then it return all) and random it.

answered Oct 20 '22 16:10

Marek

Related questions
                            
                                How to create an interactive plot of GTFS data in R using Leaflet?
                            
                                What are the steps necessary to pass R objects to a Rust program?
                            
                                How to use dcast.data.table with formula as string
                            
                                Rmarkdown Retain .tex file
                            
                                Using R - How do I search for a file/folder on all drives (hard drives as well as USB drives)
                            
                                Sed directory not found when running R with -e flag
                            
                                Getting GPS position with R package leaflet
                            
                                Can't use Rcpp engine in R Markdown
                            
                                R/ShinyApp not showing plot_ly in browser but show only graph in viewer pane
                            
                                Setting the schema name in postgres using R
                            
                                Only grobs allowed in gList
                            
                                Reading csv files in chunks with `readr::read_csv_chunked()`
                            
                                Change geom_text to bold when parse=TRUE
                            
                                Download multiple csv files with one button (downloadhandler) with R Shiny
                            
                                R- knitr:kable - How to display table without column names?
                            
                                Close and open all chunks feature in a RMarkdown script in Rstudio
                            
                                Select values row-wise based on rank among dates
                            
                                Removing ggplot legend symbol while retaining label
                            
                                How to override ggplot2's axis formatting?
                            
                                quadprog optimization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With