Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to randomize (or permute) a dataframe rowwise and columnwise?

I have a dataframe (df1) like this.

     f1   f2   f3   f4   f5 d1   1    0    1    1    1   d2   1    0    0    1    0 d3   0    0    0    1    1 d4   0    1    0    0    1 

The d1...d4 column is the rowname, the f1...f5 row is the columnname.

To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.

Is it possible to do the randomization row-wise or column-wise?

I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.

     f1   f2   f3   f4   f5 d1   1    0    0    0    1   d2   0    1    0    1    1 d3   1    0    0    1    1 d4   0    0    1    1    0 

Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:

     f1   f2   f3   f4   f5 d1   0    1    1    1    1  <- two entries are different d2   0    0    1    0    1  <- four entries are different d3   1    0    0    0    1  <- two entries are different d4   0    0    1    0    1  <- two entries are different 

PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.

like image 697
a83 Avatar asked Jun 21 '11 08:06

a83


People also ask

How do I randomize a Dataframe in R?

We can shuffle the rows in the dataframe by using sample() function. By providing indexing to the dataframe the required task can be easily achieved. Where. sample() function is used to shuffle the rows that takes a parameter with a function called nrow() with a slice operator to get all rows shuffled.


2 Answers

Given the R data.frame:

> df1   a b c 1 1 1 0 2 1 0 0 3 0 1 0 4 0 0 0 

Shuffle row-wise:

> df2 <- df1[sample(nrow(df1)),] > df2   a b c 3 0 1 0 4 0 0 0 2 1 0 0 1 1 1 0 

By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.

Shuffle column-wise:

> df3 <- df1[,sample(ncol(df1))] > df3   c a b 1 0 1 1 2 0 1 0 3 0 0 1 4 0 0 0 
like image 114
pms Avatar answered Sep 27 '22 19:09

pms


This is another way to shuffle the data.frame using package dplyr:

row-wise:

df2 <- slice(df1, sample(1:n())) 

or

df2 <- sample_frac(df1, 1L) 

column-wise:

df2 <- select(df1, one_of(sample(names(df1))))  
like image 25
Enrique Pérez Herrero Avatar answered Sep 27 '22 20:09

Enrique Pérez Herrero