In R, I have the following sample data table:
library(data.table)
x <- data.table(Group = c("d1", "d1", "d1", "d1", "d2", "d3", "d3", "d4", "d5", "d5", "d5", "d6", "d7", "d7", "d7", "d7", "d7"))
x[, InternalOrder := seq(.N), by = Group]
Which looks like this:
# Input:
#
Group InternalOrder
1: d1 1
2: d1 2
3: d1 3
4: d1 4
5: d2 1
6: d3 1
7: d3 2
8: d4 1
9: d5 1
10: d5 2
11: d5 3
12: d6 1
13: d7 1
14: d7 2
15: d7 3
16: d7 4
17: d7 5
My goal is to randomise the order of groups in the data table x while preserving the internal order of each group.
I have already worked out a solution
groupsizes <- x[, .N, by = Group]$N # Get number of elements (= rows) for each group
set.seed(10)
x[, RandomGroupID := rep(sample(c(1:length(unique(x$Group))), replace = F), groupsizes)] # Make new column with random ID for each group
setorder(x, RandomGroupID, InternalOrder) # Re-order data by random group ID and internal order
that gives the desired output:
# Output (as desired):
Group InternalOrder RandomGroupID
1: d5 1 1
2: d5 2 1
3: d5 3 1
4: d2 1 2
5: d3 1 3
6: d3 2 3
7: d1 1 4
8: d1 2 4
9: d1 3 4
10: d1 4 4
11: d4 1 5
12: d7 1 6
13: d7 2 6
14: d7 3 6
15: d7 4 6
16: d7 5 6
17: d6 1 7
Since I am trying to improve my data table skills, I would like to know if there is a nicer, more idiomatic solution that does not require the intermediate step of creating the vector groupsizes
but assigns a new column making use of the typical data table syntax using the by
argument in combination with .GRP
or .I
or the like.
I have thought of something like x[, RandomGroupIDAlternative := rep(sample(c(1:length(unique(x$Group))), replace = F), .GRP), by = Group]
which obviously does not give the desired output.
I am looking forward to your comments and to seeing alternative solutions to this problem.
This can be done idiomatically by joining to a randomised list of groups.
x[sample(unique(Group)), on = "Group"][, RandomGroupID := .GRP, by = Group][]
You can also do it using split
and rbindlist
:
x_new <- rbindlist(sample(split(x, by='Group')))
Group InternalOrder
1: d4 1
2: d1 1
3: d1 2
4: d1 3
5: d1 4
6: d5 1
7: d5 2
8: d5 3
9: d6 1
10: d7 1
11: d7 2
12: d7 3
13: d7 4
14: d7 5
15: d3 1
16: d3 2
17: d2 1
Here's one possibility:
x[, RandomGroupID := runif(1), by = Group ]
x[order(RandomGroupID), RandomGroupID := as.numeric(.GRP), by = Group]
Output:
Group InternalOrder RandomGroupID
1: d1 1 4
2: d1 2 4
3: d1 3 4
4: d1 4 4
5: d2 1 7
6: d3 1 6
7: d3 2 6
8: d4 1 1
9: d5 1 2
10: d5 2 2
11: d5 3 2
12: d6 1 5
13: d7 1 3
14: d7 2 3
15: d7 3 3
16: d7 4 3
17: d7 5 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With