This question builds from the SO post found here and uses code that was modified from a post on the R-help mailing list which can be seen here I am trying to extract a random sample of rows in a data frame but with a conditional. Using the R <code>iris</code> data which looks like: <pre class="prettyprint"><code>> head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa </code></pre> To take a simple random sample, the code below works fine to take a sample of 2 rows. <pre class="prettyprint"><code>iris[sample(nrow(iris), 2), ] </code></pre> However I am unsure how to condition the Species field. For example how to take the random sample as indicated above but only when <code>Species != “setosa”</code> There are three categories of <code>iris$Species</code> <pre class="prettyprint"><code>> summary(iris$Species) setosa versicolor virginica 50 50 50 </code></pre> I am unsure how to correctly nest conditionals. One of my earlier attempts is below with the obviously incorrect results included…. <pre class="prettyprint"><code>> iris[sample(nrow(iris)[iris$Species != "setosa"], 2), ] Sepal.Length Sepal.Width Petal.Length Petal.Width Species NA NA NA NA NA <NA> NA.1 NA NA NA NA <NA> </code></pre> Thanks

I'd use <code>which</code> to get the vector of rows numbers from which you can <code>sample</code> given your condition.... <pre class="prettyprint"><code>iris[ sample( which( iris$Species != "setosa" ) , 2 ) , ] # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #59 6.6 2.9 4.6 1.3 versicolor #133 6.4 2.8 5.6 2.2 virginica </code></pre>

With dplyr: <pre class="prettyprint"><code>library(dplyr) set.seed(12) filter(iris, Species != "setosa") %>% sample_n(., 2) </code></pre> Output: <pre class="prettyprint"><code> Sepal.Length Sepal.Width Petal.Length Petal.Width Species 7 6.3 3.3 4.7 1.6 versicolor 81 7.4 2.8 6.1 1.9 virginica </code></pre>

Extracting a random sample of rows in a data.frame with a nested conditional

Tags:

random

r

This question builds from the SO post found here and uses code that was modified from a post on the R-help mailing list which can be seen here

I am trying to extract a random sample of rows in a data frame but with a conditional. Using the R iris data which looks like:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

To take a simple random sample, the code below works fine to take a sample of 2 rows.

iris[sample(nrow(iris), 2), ]

However I am unsure how to condition the Species field. For example how to take the random sample as indicated above but only when Species != “setosa”

There are three categories of iris$Species

> summary(iris$Species)
    setosa versicolor  virginica 
        50         50         50

I am unsure how to correctly nest conditionals. One of my earlier attempts is below with the obviously incorrect results included….

> iris[sample(nrow(iris)[iris$Species != "setosa"], 2), ]
     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
NA             NA          NA           NA          NA    <NA>
NA.1           NA          NA           NA          NA    <NA>

Thanks

236

asked Nov 14 '13 22:11

B. Davis

2 Answers

I'd use which to get the vector of rows numbers from which you can sample given your condition....

iris[ sample( which( iris$Species != "setosa" ) , 2 ) , ]
#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#59           6.6         2.9          4.6         1.3 versicolor
#133          6.4         2.8          5.6         2.2  virginica

answered Sep 25 '22 23:09

Simon O'Hanlon

With dplyr:

library(dplyr)
set.seed(12)
filter(iris, Species != "setosa") %>% sample_n(., 2)

Output:

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
7           6.3         3.3          4.7         1.6 versicolor
81          7.4         2.8          6.1         1.9  virginica

answered Sep 24 '22 23:09

mpalanco

Related questions
                            
                                How can a line be overlaid on a bar plot using ggplot2?
                            
                                Install RPostgreSQL on RHEL 6.5 libpq-fe.h Error
                            
                                Can't change params in Rmd documents
                            
                                How do I flip rows and columns in R
                            
                                Label lines in a plot
                            
                                UTF-8 file output in R
                            
                                Running R scripts from VBA
                            
                                Collapse rows in a data frame using R
                            
                                Does R have any package for parsing out the parts of a URL?
                            
                                Label individual panels in a multi-panel ggplot2
                            
                                Convert a vector of string to a vector of integer
                            
                                executing cv.glmnet in parallel in R
                            
                                Fastest way to extract hour from time (HH:MM)
                            
                                How do I remove verbs, prepositions, conjunctions etc from my text? [closed]
                            
                                Text labels with background colour in R
                            
                                Explain ungroup() in dplyr
                            
                                Deciding between NumericVector and arma::vec in Rcpp
                            
                                Function that converts a vector of numbers to a vector of standard units
                            
                                Combine data.frames summing up values of identical columns in R
                            
                                Dealing with wrong spelling when matching text strings in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With