Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random sample from a data frame in R

Tags:

r

I have the following data frame:

id<-c(1,1,2,3,3)
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
df<-data.frame(id,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")

id     date      date2
1   23-01-08 2008-01-23
1   01-11-07 2007-11-01
2   30-11-07 2007-11-30
3   17-12-07 2007-12-17
3   12-12-08 2008-12-12

Now I want to extract a random sample of ids and not the rows. In fact I am looking for a way to randomly pick two of the ids and extract all records related to them. For instance if it randomly pick ids 2 and 3 the output data frame should look like:

id     date      date2
2   30-11-07 2007-11-30
3   17-12-07 2007-12-17
3   12-12-08 2008-12-12

Any helps would be appreciated.

like image 639
AliCivil Avatar asked Dec 05 '22 00:12

AliCivil


2 Answers

You can randomly pick two IDs using sample()

chosen <- sample(unique(df$id), 2)

and then extract those records

subset(df, id %in% chosen)
like image 137
MrFlick Avatar answered Dec 06 '22 15:12

MrFlick


Or using dplyr

library(dplyr)
df %>% 
    filter(id %in% sample(unique(id),2))
#  id     date      date2
#1  2 30-11-07 2007-11-30
#2  3 17-12-07 2007-12-17
#3  3 12-12-08 2008-12-12

Or

df %>%
     select(id) %>%
     unique() %>%
     sample_n(2) %>%
     semi_join(df, .)
#  id     date      date2
#1  1 23-01-08 2008-01-23
#2  1 01-11-07 2007-11-01
#3  2 30-11-07 2007-11-30
like image 36
akrun Avatar answered Dec 06 '22 14:12

akrun