Sample a proportion of each group, BUT with a minimum constraint (using dplyr)

Question

I have a population of 6 categories (stratum) and I want in each stratum to take the 10% as a sample. Doing so I take:

var = c(rep("A",10),rep("B",10),rep("C",3),rep("D",5),"E","F");var
value = rnorm(30)
dat = tibble(var,value);
pop=dat%>%group_by(var)
pop
singleallocperce = slice_sample(pop, prop=0.1);
singleallocperce

with result:

# A tibble: 2 x 2
# Groups:   var [2]
  var   value
  <chr> <dbl>
1 A     -1.54
2 B     -1.12

But I want even if in some stratum that the polupation inside them cannot reach the taken sample of 10% to take at least one observation.How can I do it this in R using dplyr package?

Additional

Additionally if I want to make proportional allocation sampling (ie with weight proportional to the subpopulation of each stratum fro example for A the weight will be : 10/30,for B: 10/30,for C:3/30,D:5/30 etc ) keeping the constrain of 1 observation if the subpopulation does not meet that requirement ?

Carl · Accepted Answer

Possible approach (note the presence of 20 x A to check two are returned).

library(tidyverse)

# Data (note 20 As)
var = c(rep("A",20),rep("B",10),rep("C",3),rep("D",5),"E","F")
value = rnorm(40)
dat = tibble(var, value)

# Possible approach
dat %>%
  group_by(var) %>%
  mutate(min = if_else(n() * 0.1 >= 1, n() * 0.1, 1),
         random = sample(n())) %>%
  filter(random <= min) |> 
  select(var, value)
#> # A tibble: 7 × 2
#> # Groups:   var [6]
#>   var     value
#>   <chr>   <dbl>
#> 1 A      0.0105
#> 2 A      0.171 
#> 3 B     -1.89  
#> 4 C      1.89  
#> 5 D      0.612 
#> 6 E      0.516 
#> 7 F      0.185

^{Created on 2022-06-02 by the reprex package (v2.0.1)}

Weighted version:

library(tidyverse)

# Data (note 20 As)
var = c(rep("A",20),rep("B",10),rep("C",3),rep("D",5),"E","F")
value = rnorm(40)
dat = tibble(var, value)

# Possible approach
dat %>%
  add_count(name = "n_all") %>%
  group_by(var) %>%
  mutate(
    weight = n() / n_all,
    min = if_else(n() * weight >= 1, n() * weight, 1),
    random = sample(n())
  ) %>%
  filter(random <= min) |>
  select(var, value)
#> # A tibble: 16 × 2
#> # Groups:   var [6]
#>    var     value
#>    <chr>   <dbl>
#>  1 A      0.339 
#>  2 A      1.77  
#>  3 A     -0.145 
#>  4 A     -0.915 
#>  5 A      0.146 
#>  6 A      0.896 
#>  7 A     -0.407 
#>  8 A     -1.30  
#>  9 A      1.22  
#> 10 A      0.0527
#> 11 B     -0.602 
#> 12 B     -0.432 
#> 13 C     -0.0540
#> 14 D     -1.45  
#> 15 E      1.54  
#> 16 F      0.879

^{Created on 2022-06-09 by the reprex package (v2.0.1)}

Matt · Answer

Here is a potential solution:

sample_func <- function(data) {
  standard <- data %>% 
    group_by(var) %>% 
    slice_sample(prop = 0.1) %>% 
    ungroup()
  
  if(!all(unique(data$var) %in% unique(standard$var))) {
    mins <- data %>% 
      filter(!var %in% standard$var) %>% 
      group_by(var) %>% 
      slice(1) %>% 
      ungroup()
  }
  
  bind_rows(standard, mins) 
  
}

sample_func(dat)

Which gives:

  var     value
  <chr>   <dbl>
1 A      1.36  
2 B     -1.03  
3 C     -0.0450
4 D     -0.380 
5 E     -0.0556
6 F      0.519

The assumption is that if you are sampling proportionally and do not have any sample for var, that the minimum threshold would be sampling one record from var (by using slice(1)).

Sample a proportion of each group, BUT with a minimum constraint (using dplyr)

Tags:

r

dplyr

sample

Homer Jay Simpson

2 Answers

Carl

Matt

Recent Activity

Donate For Us

Sample a proportion of each group, BUT with a minimum constraint (using dplyr)

Tags:

r

dplyr

sample

Homer Jay Simpson

2 Answers

Carl

Matt

Related questions

Recent Activity

Donate For Us