dplyr sample_n by group with unique size argument per group

Question

I am trying to draw a stratified sample from a data set for which a variable exists that indicates how large the sample size per group should be.

library(dplyr)
# example data 
df <- data.frame(id = 1:15,
                 grp = rep(1:3,each = 5), 
                 frq = rep(c(3,2,4), each = 5))

In this example, grp refers to the group I want to sample by and frq is the sample size specificied for that group.

Using split, I came up with this possible solution, which gives the desired result but seems rather inefficient :

s <- split(df, df$grp)
lapply(s,function(x) sample_n(x, size = unique(x$frq))) %>% 
      do.call(what = rbind)

Is there a way using just dplyr's group_by and sample_n to do this?

My first thought was:

df %>% group_by(grp) %>% sample_n(size = frq)

but this gives the error:

Error in is_scalar_integerish(size) : object 'frq' not found

thc · Accepted Answer

This works:

df %>% group_by(grp) %>% sample_n(frq[1])

# A tibble: 9 x 3
# Groups:   grp [3]
     id   grp   frq
  <int> <int> <dbl>
1     3     1     3
2     4     1     3
3     2     1     3
4     6     2     2
5     8     2     2
6    13     3     4
7    14     3     4
8    12     3     4
9    11     3     4

Not sure why it didn't work when you tried it.

AntoniosK · Answer

library(tidyverse)

# example data 
df <- data.frame(id = 1:15,
                 grp = rep(1:3,each = 5), 
                 frq = rep(c(3,2,4), each = 5))

set.seed(22)

df %>%
  group_by(grp) %>%   # for each group
  nest() %>%          # nest data
  mutate(v = map(data, ~sample_n(data.frame(id=.$id), unique(.$frq)))) %>%  # sample using id values and (unique) frq value
  unnest(v)           # unnest the sampled values

# # A tibble: 9 x 2
#     grp    id
#   <int> <int>
# 1     1     2
# 2     1     5
# 3     1     3
# 4     2     8
# 5     2     9
# 6     3    14
# 7     3    13
# 8     3    15
# 9     3    11

Function sample_n works if you pass as inputs a data frame of ids (not a vector of ids) and one frequency value (for each group).

An alternative version using map2 and generating the inputs for sample_n in advance:

df %>%
  group_by(grp) %>%                                 # for every group
  summarise(d = list(data.frame(id=id)),            # create a data frame of ids
            frq = unique(frq)) %>%                  # get the unique frq value
  mutate(v = map2(d, frq, ~sample_n(.x, .y))) %>%   # sample using data frame of ids and frq value
  unnest(v) %>%                                     # unnest sampled values
  select(-frq)                                      # remove frq column (if needed)

dplyr sample_n by group with unique size argument per group

Tags:

r

dplyr

Fred

2 Answers

thc

AntoniosK

Recent Activity

Donate For Us

dplyr sample_n by group with unique size argument per group

Tags:

r

dplyr

Fred

2 Answers

thc

AntoniosK

Related questions

Recent Activity

Donate For Us