Stratified sampling on factor

Question

I have a dataset of 1000 rows with the following structure:

     device geslacht leeftijd type1 type2
1       mob        0       53     C     3
2       tab        1       64     G     7
3        pc        1       50     G     7
4       tab        0       75     C     3
5       mob        1       54     G     7
6        pc        1       58     H     8
7        pc        1       57     A     1
8        pc        0       68     E     5
9        pc        0       66     G     7
10      mob        0       45     C     3
11      tab        1       77     E     5
12      mob        1       16     A     1

I would like to make a sample of 80 rows, composed of 10 rows with type1 = A, 10 rows with type1 = B, and so on. Is there anyone who can help he?

David Arenburg · Accepted Answer

Here's how I would approach this using data.table

library(data.table)
indx <- setDT(df)[, .I[sample(.N, 10, replace = TRUE)], by = type1]$V1
df[indx]
#     device geslacht leeftijd type1 type2
#  1:    mob        0       45     C     3
#  2:    mob        0       53     C     3
#  3:    tab        0       75     C     3
#  4:    mob        0       53     C     3
#  5:    tab        0       75     C     3
#  6:    mob        0       45     C     3
#  7:    tab        0       75     C     3
#  8:    mob        0       53     C     3
#  9:    mob        0       53     C     3
# 10:    mob        0       53     C     3
# 11:    mob        1       54     G     7
#...

Or a simpler version would be

setDT(df)[, .SD[sample(.N, 10, replace = TRUE)], by = type1]

Basically we are sampling (with replacement- as you have less than 10 rows within each group) from the row indexes within each group of type1 and then subsetting the data by this index

Similarly with dplyr you could do

library(dplyr)
df %>% 
  group_by(type1) %>%
  sample_n(10, replace = TRUE)

Stratified sampling on factor

Tags:

dataframe

r

sampling

karmabob

1 Answers

David Arenburg

Recent Activity

Donate For Us

Stratified sampling on factor

Tags:

dataframe

r

sampling

karmabob

1 Answers

David Arenburg

Related questions

Recent Activity

Donate For Us