Given a data.table
such as:
data.table::data.table(a = c(1,2,3), b = c("red","blue","yellow"), c = c(TRUE, FALSE, TRUE), d = c(21, 45, 34, 26))
a b c d
1: 1 red TRUE 21
2: 2 blue FALSE 45
3: 3 yellow TRUE 34
4: 4 green FALSE 26
where a
is a unique row identifier, how could I randomize/anonymize the data so that the columns shuffle within their own column. This would create a random data.table
that looks something like:
a b c d
1: 1 green TRUE 26
2: 2 yellow FALSE 45
3: 3 red FALSE 21
4: 4 blue TRUE 34
Rearrange or reorder the column Alphabetically in R: Rearranging the column in alphabetical order can be done with the help of select() function & order() function along with pipe operator. In another method it can also be accomplished simply with help of order() function only.
We can shuffle the rows in the dataframe by using sample() function. By providing indexing to the dataframe the required task can be easily achieved. Where. sample() function is used to shuffle the rows that takes a parameter with a function called nrow() with a slice operator to get all rows shuffled.
If it is random for each column that doesn't tie with the rows, then use sample
on the columns specified in the .SDcols
by looping over them in lapply
and assign (:=
) the output back to the columns
dt1[, (2:4) := lapply(.SD, sample), .SDcols = 2:4]
-output
dt1
# a b c d
#1: 1 blue FALSE 34
#2: 2 red TRUE 21
#3: 3 green FALSE 45
#4: 4 yellow TRUE 26
Or another option is set
for(j in names(dt1)[-1]) {
set(dt1, i = NULL, j = j, value = sample(dt1[[j]]))
}
dt1 <- data.table::data.table(
a = c(1,2,3, 4),
b = c("red","blue","yellow", "green"),
c = c(TRUE, FALSE, TRUE, FALSE),
d = c(21, 45, 34, 26)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With