I am building confidence intervals for groups with bootstrapped values and I'm having trouble creating multiple re-sampled datasets from which to build my confidence intervals.
Using the palmerpenguins
library as an example:
library(tidyverse)
library(infer)
library(palmerpenguins)
There are 344 total observations and each species has a different number of observations:
nrow(penguins)
[1] 344
penguins %>% group_by(species) %>% count()
# A tibble: 3 × 2
# Groups: species [3]
species n
<fct> <int>
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
I want to be able to group by the species, and for each species pull multiple samples while using the original number of observations per each group.
set.seed(100)
slices <- penguins2 %>%
group_by(species) %>%
rep_slice_sample(prop = 1, replace = TRUE, reps = 10)
That should give me 344 * 10 = 3440
lines in the full new data set. This is true, but when you look at the data you can see that each replicate has a different number of observations. For all of the Adelie, n per sample should be 152, chinstrap should be 68, and Gentoo should be 124. Instead we find this:
slices %>% group_by(species, replicate) %>% count()
# A tibble: 30 × 3
# Groups: species, replicate [30]
species replicate n
<fct> <int> <int>
1 Adelie 1 148
2 Adelie 2 147
3 Adelie 3 148
4 Adelie 4 151
5 Adelie 5 138
6 Adelie 6 157
7 Adelie 7 161
8 Adelie 8 157
9 Adelie 9 151
10 Adelie 10 138
# ℹ 20 more rows
# ℹ Use `print(n = ...)` to see more rows
What am I missing?
Another option with slice_sample
:
(penguins %>% slice_sample(prop = 10, replace = TRUE, by = species)
also works (i.e. with prop = 10
), but doesn't provide the replicate number.)
library(tidyverse)
library(palmerpenguins)
set.seed(100)
slices <- map(1:10, \(x)(
penguins %>%
slice_sample(prop = 1, replace = TRUE, by = species) |>
mutate(replicate = x)
)) |>
bind_rows()
slices %>% count(species, replicate)
#> # A tibble: 30 × 3
#> species replicate n
#> <fct> <int> <int>
#> 1 Adelie 1 152
#> 2 Adelie 2 152
#> 3 Adelie 3 152
#> 4 Adelie 4 152
#> 5 Adelie 5 152
#> 6 Adelie 6 152
#> 7 Adelie 7 152
#> 8 Adelie 8 152
#> 9 Adelie 9 152
#> 10 Adelie 10 152
#> # ℹ 20 more rows
Created on 2024-03-17 with reprex v2.1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With