Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Shuffle and randomize columns of a data table

Tags:

r

data.table

Given a data.table such as:

data.table::data.table(a = c(1,2,3), b = c("red","blue","yellow"), c = c(TRUE, FALSE, TRUE), d = c(21, 45, 34, 26))

   a      b     c  d
1: 1    red  TRUE 21
2: 2   blue FALSE 45
3: 3 yellow  TRUE 34
4: 4  green FALSE 26

where a is a unique row identifier, how could I randomize/anonymize the data so that the columns shuffle within their own column. This would create a random data.table that looks something like:

   a      b     c  d
1: 1  green  TRUE 26
2: 2 yellow FALSE 45
3: 3    red FALSE 21
4: 4   blue  TRUE 34
like image 864
Dylan Russell Avatar asked Sep 19 '20 23:09

Dylan Russell


People also ask

How do I shuffle a column in R?

Rearrange or reorder the column Alphabetically in R: Rearranging the column in alphabetical order can be done with the help of select() function & order() function along with pipe operator. In another method it can also be accomplished simply with help of order() function only.

How do I shuffle random data in R?

We can shuffle the rows in the dataframe by using sample() function. By providing indexing to the dataframe the required task can be easily achieved. Where. sample() function is used to shuffle the rows that takes a parameter with a function called nrow() with a slice operator to get all rows shuffled.


1 Answers

If it is random for each column that doesn't tie with the rows, then use sample on the columns specified in the .SDcols by looping over them in lapply and assign (:=) the output back to the columns

dt1[, (2:4) := lapply(.SD, sample), .SDcols = 2:4]

-output

dt1
#   a      b     c  d
#1: 1   blue FALSE 34
#2: 2    red  TRUE 21
#3: 3  green FALSE 45
#4: 4 yellow  TRUE 26

Or another option is set

for(j in names(dt1)[-1]) {
      set(dt1, i = NULL, j = j, value = sample(dt1[[j]]))
 }

data

dt1 <- data.table::data.table(
      a = c(1,2,3, 4),
      b = c("red","blue","yellow", "green"), 
      c = c(TRUE, FALSE, TRUE, FALSE), 
      d = c(21, 45, 34, 26)
   )
like image 175
akrun Avatar answered Sep 28 '22 05:09

akrun