How can I draw n rows from a group where each group has a different number of rows?
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
I've tried,
library(dplyr)
outdat <- df %>%
group_by(color) %>%
sample_n(nrow(.), replace = TRUE)
outdat
but this returns a data.frame where nrow(.) are nrows from df and not the subset.
This SO post is close, but defines a specific number of row draws. I need it to be specific to group within dplyr.
Another workaround, use sample_frac:
outdat <- df %>%
group_by(color) %>%
sample_frac(1, replace = TRUE)
outdat
# # A tibble: 40 x 3
# # Groups: color [4]
# X1 X2 color
# <dbl> <dbl> <chr>
# 1 0.69256186 0.97180252 blue
# 2 1.54384827 -0.20268802 blue
# 3 -1.20068240 -0.45402013 blue
# 4 2.63407877 -0.31644247 blue
# 5 1.20716737 -0.91380874 blue
# 6 0.01067475 1.02004679 blue
# 7 0.01067475 1.02004679 blue
# 8 1.79732108 -0.04072946 blue
# 9 0.01067475 1.02004679 blue
# 10 1.79732108 -0.04072946 blue
# # ... with 30 more rows
Additionally, use outdat %>% ungroup() to remove grouping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With