From these questions - Random sample of rows from subset of an R dataframe & Sample random rows in dataframe I can easily see how to randomly sample (select) 'n' rows from a df, or 'n' rows that originate from a specific level of a factor within a df. Here are some sample data: <pre class="prettyprint"><code>df <- data.frame(matrix(rnorm(80), nrow=40)) df$color <- rep(c("blue", "red", "yellow", "pink"), each=10) df[sample(nrow(df), 3), ] #samples 3 random rows from df, without replacement. </code></pre> To e.g. just sample 3 random rows from 'pink' color - using <code>library(kimisc)</code>: <pre class="prettyprint"><code>library(kimisc) sample.rows(subset(df, color == "pink"), 3) </code></pre> or writing custom function: <pre class="prettyprint"><code>sample.df <- function(df, n) df[sample(nrow(df), n), , drop = FALSE] sample.df(subset(df, color == "pink"), 3) </code></pre> However, I want to sample 3 (or n) random rows from each level of the factor. I.e. the new df would have 12 rows (3 from blue, 3 from red, 3 from yellow, 3 from pink). It's obviously possible to run this several times, create newdfs for each color, and then bind them together, but I am looking for a simpler solution.

In versions of <code>dplyr</code> 0.3 and later, this works just fine: <pre class="prettyprint"><code>df %>% group_by(color) %>% sample_n(size = 3) </code></pre> <h3>Old versions of <code>dplyr</code> (version <= 0.2)</h3> I set out to answer this using dplyr, assuming that this would work: <pre class="prettyprint"><code>df %.% group_by(color) %.% sample_n(size = 3) </code></pre> But it turns out that in 0.2 the <code>sample_n.grouped_df</code> S3 method exists but isn't registered in the NAMESPACE file, so it's never dispatched. Instead, I had to do this: <pre class="prettyprint"><code>df %.% group_by(color) %.% dplyr:::sample_n.grouped_df(size = 3) Source: local data frame [12 x 3] Groups: color X1 X2 color 8 0.66152710 -0.7767473 blue 1 -0.70293752 -0.2372700 blue 2 -0.46691793 -0.4382669 blue 32 -0.47547565 -1.0179842 pink 31 -0.15254540 -0.6149726 pink 39 0.08135292 -0.2141423 pink 15 0.47721644 -1.5033192 red 16 1.26160230 1.1202527 red 12 -2.18431919 0.2370912 red 24 0.10493757 1.4065835 yellow 21 -0.03950873 -1.1582658 yellow 28 -2.15872261 -1.5499822 yellow </code></pre> Presumably this will be fixed in a future update.

Sample n random rows per group in a dataframe

Tags:

random

dataframe

r

sample

From these questions - Random sample of rows from subset of an R dataframe & Sample random rows in dataframe I can easily see how to randomly sample (select) 'n' rows from a df, or 'n' rows that originate from a specific level of a factor within a df.

Here are some sample data:

df <- data.frame(matrix(rnorm(80), nrow=40)) df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)  df[sample(nrow(df), 3), ] #samples 3 random rows from df, without replacement.

To e.g. just sample 3 random rows from 'pink' color - using library(kimisc):

library(kimisc) sample.rows(subset(df, color == "pink"), 3)

or writing custom function:

sample.df <- function(df, n) df[sample(nrow(df), n), , drop = FALSE] sample.df(subset(df, color == "pink"), 3)

However, I want to sample 3 (or n) random rows from each level of the factor. I.e. the new df would have 12 rows (3 from blue, 3 from red, 3 from yellow, 3 from pink). It's obviously possible to run this several times, create newdfs for each color, and then bind them together, but I am looking for a simpler solution.

655

asked May 23 '14 14:05

jalapic

1 Answers

In versions of dplyr 0.3 and later, this works just fine:

df %>% group_by(color) %>% sample_n(size = 3)

Old versions of `dplyr` (version <= 0.2)

I set out to answer this using dplyr, assuming that this would work:

df %.% group_by(color) %.% sample_n(size = 3)

But it turns out that in 0.2 the sample_n.grouped_df S3 method exists but isn't registered in the NAMESPACE file, so it's never dispatched. Instead, I had to do this:

df %.% group_by(color) %.% dplyr:::sample_n.grouped_df(size = 3) Source: local data frame [12 x 3] Groups: color              X1         X2  color 8   0.66152710 -0.7767473   blue 1  -0.70293752 -0.2372700   blue 2  -0.46691793 -0.4382669   blue 32 -0.47547565 -1.0179842   pink 31 -0.15254540 -0.6149726   pink 39  0.08135292 -0.2141423   pink 15  0.47721644 -1.5033192    red 16  1.26160230  1.1202527    red 12 -2.18431919  0.2370912    red 24  0.10493757  1.4065835 yellow 21 -0.03950873 -1.1582658 yellow 28 -2.15872261 -1.5499822 yellow

Presumably this will be fixed in a future update.

137

answered Sep 23 '22 01:09

joran

Related questions
                            
                                Aligning Columns with knitr kable function
                            
                                R's which() and which.min() Equivalent in Python
                            
                                Efficient alternatives to merge for larger data.frames R
                            
                                Principal Components Analysis - how to get the contribution (%) of each parameter to a Prin.Comp.?
                            
                                Fast Fourier Transform in R
                            
                                r legend trouble , how to change the text size in legend
                            
                                The condition has length > 1 and only the first element will be used
                            
                                appending to a list with dynamic names
                            
                                how to access global/outer scope variable from R apply function?
                            
                                R ggplot2 - How do I specify out of bounds values' colour
                            
                                rowwise maximum for R
                            
                                Return row of Data Frame based on value in a column - R
                            
                                Recoding variables with R
                            
                                ggplot2 and a Stacked Bar Chart with Negative Values
                            
                                How to run R on a server without X11, and avoid broken dependencies
                            
                                Mean of elements in a list of data.frames
                            
                                Which layout should I use to get non-overlapping edges in igraph?
                            
                                Error in R: (Package which is only available in source form, and may need compilation of C/C++/Fortran)
                            
                                ggplot2 heatmaps: using different gradients for categories
                            
                                Stop an R program without error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sample n random rows per group in a dataframe

Tags:

random

dataframe

r

sample

jalapic

People also ask

1 Answers

Old versions of `dplyr` (version <= 0.2)

joran

Recent Activity

Donate For Us

Sample n random rows per group in a dataframe

Tags:

random

dataframe

r

sample

jalapic

People also ask

1 Answers

Old versions of dplyr (version <= 0.2)

joran

Related questions

Recent Activity

Donate For Us

Old versions of `dplyr` (version <= 0.2)