Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sample n random draw within group with different nrows

Tags:

r

dplyr

How can I draw n rows from a group where each group has a different number of rows?

df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)

I've tried,

library(dplyr)
outdat <- df %>% 
  group_by(color) %>% 
  sample_n(nrow(.), replace = TRUE)
outdat

but this returns a data.frame where nrow(.) are nrows from df and not the subset.

This SO post is close, but defines a specific number of row draws. I need it to be specific to group within dplyr.

like image 331
Vedda Avatar asked Jan 17 '26 05:01

Vedda


1 Answers

Another workaround, use sample_frac:

outdat <- df %>%
    group_by(color) %>%
    sample_frac(1, replace = TRUE)
outdat
# # A tibble: 40 x 3
# # Groups:   color [4]
#             X1          X2 color
#          <dbl>       <dbl> <chr>
#  1  0.69256186  0.97180252  blue
#  2  1.54384827 -0.20268802  blue
#  3 -1.20068240 -0.45402013  blue
#  4  2.63407877 -0.31644247  blue
#  5  1.20716737 -0.91380874  blue
#  6  0.01067475  1.02004679  blue
#  7  0.01067475  1.02004679  blue
#  8  1.79732108 -0.04072946  blue
#  9  0.01067475  1.02004679  blue
# 10  1.79732108 -0.04072946  blue
# # ... with 30 more rows

Additionally, use outdat %>% ungroup() to remove grouping.

like image 181
mt1022 Avatar answered Jan 19 '26 18:01

mt1022



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!