Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stratified sampling on factor

I have a dataset of 1000 rows with the following structure:

     device geslacht leeftijd type1 type2
1       mob        0       53     C     3
2       tab        1       64     G     7
3        pc        1       50     G     7
4       tab        0       75     C     3
5       mob        1       54     G     7
6        pc        1       58     H     8
7        pc        1       57     A     1
8        pc        0       68     E     5
9        pc        0       66     G     7
10      mob        0       45     C     3
11      tab        1       77     E     5
12      mob        1       16     A     1

I would like to make a sample of 80 rows, composed of 10 rows with type1 = A, 10 rows with type1 = B, and so on. Is there anyone who can help he?

like image 384
karmabob Avatar asked May 07 '15 09:05

karmabob


1 Answers

Here's how I would approach this using data.table

library(data.table)
indx <- setDT(df)[, .I[sample(.N, 10, replace = TRUE)], by = type1]$V1
df[indx]
#     device geslacht leeftijd type1 type2
#  1:    mob        0       45     C     3
#  2:    mob        0       53     C     3
#  3:    tab        0       75     C     3
#  4:    mob        0       53     C     3
#  5:    tab        0       75     C     3
#  6:    mob        0       45     C     3
#  7:    tab        0       75     C     3
#  8:    mob        0       53     C     3
#  9:    mob        0       53     C     3
# 10:    mob        0       53     C     3
# 11:    mob        1       54     G     7
#...

Or a simpler version would be

setDT(df)[, .SD[sample(.N, 10, replace = TRUE)], by = type1]

Basically we are sampling (with replacement- as you have less than 10 rows within each group) from the row indexes within each group of type1 and then subsetting the data by this index


Similarly with dplyr you could do

library(dplyr)
df %>% 
  group_by(type1) %>%
  sample_n(10, replace = TRUE)
like image 144
David Arenburg Avatar answered Oct 06 '22 13:10

David Arenburg