I have a data frame made by almost 50,000 rows spread in 15 different IDs (every ID has thousands of observations). Data frame looks like:
ID Year Temp ph
1 P1 1996 11.3 6.80
2 P1 1996 9.7 6.90
3 P1 1997 9.8 7.10
...
2000 P2 1997 10.5 6.90
2001 P2 1997 9.9 7.00
2002 P2 1997 10.0 6.93
I want to take 500 random rows for every ID (so 500 for P1, 500 for P2,....) and create a new df. I try:
new_df<-df[df$ID %in% sample(unique(dfID),500),]
But it takes randomly one ID, while I need 500 random rows for every ID.
Definition: Random sampling is a part of the sampling technique in which each sample has an equal probability of being chosen. A sample chosen randomly is meant to be an unbiased representation of the total population.
Below SQL statement is to display the defined number of random rows from a table using RAND() function: Query: SELECT * FROM table_name order by RANDOM() LIMIT n; In table_name mention your Table Name and in the place of 'n' give how many rows to be fetched.
This is available as the slice_sample
function in dplyr
:
library(dplyr) new_df <- df %>% group_by(ID) %>% slice_sample(n=500)
In older versions of R, the function was called sample_n
, which has been deprecated.
Try this:
library(plyr)
ddply(df,.(ID),function(x) x[sample(nrow(x),500),])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With