I have a data frame made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations). Data frame looks like: <pre class="prettyprint"><code> ID Year Temp ph 1 P1 1996 11.3 6.80 2 P1 1996 9.7 6.90 3 P1 1997 9.8 7.10 ... 2000 P2 1997 10.5 6.90 2001 P2 1997 9.9 7.00 2002 P2 1997 10.0 6.93 </code></pre> I want to take 500 random rows for every ID (so 500 for P1, 500 for P2,....) and create a new df. I try: <pre class="prettyprint"><code>new_df<-df[df$ID %in% sample(unique(dfID),500),] </code></pre> But it takes randomly one ID, while I need 500 random rows for every ID.

This is available as the <code>slice_sample</code> function in <code>dplyr</code>: <pre class="prettyprint lang-r prettyprint-override"><code>library(dplyr) new_df <- df %>% group_by(ID) %>% slice_sample(n=500) </code></pre> In older versions of R, the function was called <code>sample_n</code>, which has been deprecated.

Try this: <pre class="prettyprint"><code>library(plyr) ddply(df,.(ID),function(x) x[sample(nrow(x),500),]) </code></pre>

Take randomly sample based on groups

Tags:

dataframe

r

sample

I have a data frame made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations). Data frame looks like:

        ID  Year    Temp    ph
1       P1  1996    11.3    6.80
2       P1  1996    9.7     6.90
3       P1  1997    9.8     7.10
...
2000    P2  1997    10.5    6.90
2001    P2  1997    9.9     7.00
2002    P2  1997    10.0    6.93

I want to take 500 random rows for every ID (so 500 for P1, 500 for P2,....) and create a new df. I try:

new_df<-df[df$ID %in% sample(unique(dfID),500),]

But it takes randomly one ID, while I need 500 random rows for every ID.

947

asked Aug 15 '13 17:08

matteo

2 Answers

This is available as the slice_sample function in dplyr:

library(dplyr) new_df <- df %>% group_by(ID) %>% slice_sample(n=500)

In older versions of R, the function was called sample_n, which has been deprecated.

186

answered Sep 23 '22 23:09

drhagen

Try this:

library(plyr)
ddply(df,.(ID),function(x) x[sample(nrow(x),500),])

answered Sep 24 '22 23:09

joran

Related questions
                            
                                How to delete rows from a data.frame, based on an external list, using R?
                            
                                Moving color key in R heatmap.2 (function of gplots package)
                            
                                How to not show all labels on ggplot axis?
                            
                                Initialize an empty tibble with column names and 0 rows
                            
                                Calculate correlation for more than two variables?
                            
                                Selecting a subset of columns in a data.table
                            
                                How to hide or disable in-function printed message
                            
                                How can I rbind vectors matching their column names?
                            
                                Plot polynomial regression curve in R
                            
                                Random forest output interpretation
                            
                                R data.table apply function to rows using columns as arguments
                            
                                data.table - select first n rows within group [duplicate]
                            
                                using substitute to get argument name with
                            
                                Sink does not release file
                            
                                How to count the number of unique values by group? [duplicate]
                            
                                Remove fill around legend key in ggplot
                            
                                How to open CSV file in R when R says "no such file or directory"?
                            
                                How to get unsaved script tabs
                            
                                Replace multiple strings in one gsub() or chartr() statement in R?
                            
                                Angle between two vectors in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With