I have a vector of values r
as follows:
r<-c(1,3,4,6,7)
and a data frame df
with 20 records and two columns:
id<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20)
freq<-c(1,3,2,4,5,6,6,7,8,3,3,1,6,9,9,1,1,4,3,7,7)
df<-data.frame(id,freq)
Using the r
vector I need to extract a sample of records (in the form of a new data frame) from df
in a way that the freq
values of the records, would be equal to the values I have in my r
vector. Needless to say that if it finds multiple records with the same freq
values it should randomly pick one of them. For instance one possible outcome can be:
id frequency
12 1
10 3
4 4
7 6
8 7
I would be thankful if anyone could help me with this.
You could try data.table
library(data.table)
setDT(df)[freq %in% r,sample(id,1L) , freq]
Or using base R
aggregate(id~freq, df, subset=freq %in% r, FUN= sample, 1L)
If you have a vector "r" with duplicate values and want to sample the data set ('df') based on the length of unique elements in 'r'
r <-c(1,3,3,4,6,7)
res <- do.call(rbind,lapply(split(r, r), function(x) {
x1 <- df[df$freq %in% x,]
x1[sample(1:nrow(x1),length(x), replace=FALSE),]}))
row.names(res) <- NULL
You can use filter
and sample_n
from "dplyr":
library(dplyr)
set.seed(1)
df %>%
filter(freq %in% r) %>%
group_by(freq) %>%
sample_n(1)
# Source: local data frame [5 x 2]
# Groups: freq
#
# id freq
# 1 12 1
# 2 10 3
# 3 17 4
# 4 13 6
# 5 8 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With