Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomise order of groups in R data table while preserving internal order of groups

Tags:

r

data.table

In R, I have the following sample data table:

library(data.table)
x <- data.table(Group = c("d1", "d1", "d1", "d1", "d2", "d3", "d3", "d4", "d5", "d5", "d5", "d6", "d7", "d7", "d7", "d7", "d7"))
x[, InternalOrder := seq(.N), by = Group]

Which looks like this:

# Input:
#
    Group InternalOrder
 1:    d1             1
 2:    d1             2
 3:    d1             3
 4:    d1             4
 5:    d2             1
 6:    d3             1
 7:    d3             2
 8:    d4             1
 9:    d5             1
10:    d5             2
11:    d5             3
12:    d6             1
13:    d7             1
14:    d7             2
15:    d7             3
16:    d7             4
17:    d7             5

My goal is to randomise the order of groups in the data table x while preserving the internal order of each group.

I have already worked out a solution

groupsizes <- x[, .N, by = Group]$N  # Get number of elements (= rows) for each group
set.seed(10)
x[, RandomGroupID := rep(sample(c(1:length(unique(x$Group))), replace = F), groupsizes)]  # Make new column with random ID for each group
setorder(x, RandomGroupID, InternalOrder)  # Re-order data by random group ID and internal order

that gives the desired output:

# Output (as desired):

    Group InternalOrder RandomGroupID
 1:    d5             1             1
 2:    d5             2             1
 3:    d5             3             1
 4:    d2             1             2
 5:    d3             1             3
 6:    d3             2             3
 7:    d1             1             4
 8:    d1             2             4
 9:    d1             3             4
10:    d1             4             4
11:    d4             1             5
12:    d7             1             6
13:    d7             2             6
14:    d7             3             6
15:    d7             4             6
16:    d7             5             6
17:    d6             1             7

Since I am trying to improve my data table skills, I would like to know if there is a nicer, more idiomatic solution that does not require the intermediate step of creating the vector groupsizes but assigns a new column making use of the typical data table syntax using the by argument in combination with .GRP or .I or the like. I have thought of something like x[, RandomGroupIDAlternative := rep(sample(c(1:length(unique(x$Group))), replace = F), .GRP), by = Group] which obviously does not give the desired output.

I am looking forward to your comments and to seeing alternative solutions to this problem.

like image 872
MichaelU Avatar asked Jan 02 '19 09:01

MichaelU


3 Answers

This can be done idiomatically by joining to a randomised list of groups.

x[sample(unique(Group)), on = "Group"][, RandomGroupID := .GRP, by = Group][]
like image 197
anotherfred Avatar answered Nov 04 '22 05:11

anotherfred


You can also do it using split and rbindlist:

x_new <- rbindlist(sample(split(x, by='Group')))

    Group InternalOrder
 1:    d4             1
 2:    d1             1
 3:    d1             2
 4:    d1             3
 5:    d1             4
 6:    d5             1
 7:    d5             2
 8:    d5             3
 9:    d6             1
10:    d7             1
11:    d7             2
12:    d7             3
13:    d7             4
14:    d7             5
15:    d3             1
16:    d3             2
17:    d2             1
like image 45
YOLO Avatar answered Nov 04 '22 05:11

YOLO


Here's one possibility:

x[, RandomGroupID := runif(1), by = Group ]
x[order(RandomGroupID), RandomGroupID := as.numeric(.GRP), by = Group]

Output:

    Group InternalOrder RandomGroupID
 1:    d1             1             4
 2:    d1             2             4
 3:    d1             3             4
 4:    d1             4             4
 5:    d2             1             7
 6:    d3             1             6
 7:    d3             2             6
 8:    d4             1             1
 9:    d5             1             2
10:    d5             2             2
11:    d5             3             2
12:    d6             1             5
13:    d7             1             3
14:    d7             2             3
15:    d7             3             3
16:    d7             4             3
17:    d7             5             3
like image 20
Bram Van Rensbergen Avatar answered Nov 04 '22 07:11

Bram Van Rensbergen