I have a dataset of 1000 rows with the following structure:
device geslacht leeftijd type1 type2
1 mob 0 53 C 3
2 tab 1 64 G 7
3 pc 1 50 G 7
4 tab 0 75 C 3
5 mob 1 54 G 7
6 pc 1 58 H 8
7 pc 1 57 A 1
8 pc 0 68 E 5
9 pc 0 66 G 7
10 mob 0 45 C 3
11 tab 1 77 E 5
12 mob 1 16 A 1
I would like to make a sample of 80 rows, composed of 10 rows with type1 = A, 10 rows with type1 = B, and so on. Is there anyone who can help he?
Here's how I would approach this using data.table
library(data.table)
indx <- setDT(df)[, .I[sample(.N, 10, replace = TRUE)], by = type1]$V1
df[indx]
# device geslacht leeftijd type1 type2
# 1: mob 0 45 C 3
# 2: mob 0 53 C 3
# 3: tab 0 75 C 3
# 4: mob 0 53 C 3
# 5: tab 0 75 C 3
# 6: mob 0 45 C 3
# 7: tab 0 75 C 3
# 8: mob 0 53 C 3
# 9: mob 0 53 C 3
# 10: mob 0 53 C 3
# 11: mob 1 54 G 7
#...
Or a simpler version would be
setDT(df)[, .SD[sample(.N, 10, replace = TRUE)], by = type1]
Basically we are sampling (with replacement- as you have less than 10 rows within each group) from the row indexes within each group of type1
and then subsetting the data by this index
Similarly with dplyr
you could do
library(dplyr)
df %>%
group_by(type1) %>%
sample_n(10, replace = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With