I have a dataframe (df1) like this.
f1 f2 f3 f4 f5 d1 1 0 1 1 1 d2 1 0 0 1 0 d3 0 0 0 1 1 d4 0 1 0 0 1
The d1...d4 column is the rowname, the f1...f5 row is the columnname.
To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.
Is it possible to do the randomization row-wise or column-wise?
I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.
f1 f2 f3 f4 f5 d1 1 0 0 0 1 d2 0 1 0 1 1 d3 1 0 0 1 1 d4 0 0 1 1 0
Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:
f1 f2 f3 f4 f5 d1 0 1 1 1 1 <- two entries are different d2 0 0 1 0 1 <- four entries are different d3 1 0 0 0 1 <- two entries are different d4 0 0 1 0 1 <- two entries are different
PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.
We can shuffle the rows in the dataframe by using sample() function. By providing indexing to the dataframe the required task can be easily achieved. Where. sample() function is used to shuffle the rows that takes a parameter with a function called nrow() with a slice operator to get all rows shuffled.
Given the R data.frame:
> df1 a b c 1 1 1 0 2 1 0 0 3 0 1 0 4 0 0 0
Shuffle row-wise:
> df2 <- df1[sample(nrow(df1)),] > df2 a b c 3 0 1 0 4 0 0 0 2 1 0 0 1 1 1 0
By default sample()
randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE
(the default) to sample(...)
ensures that sampling is done without replacement which accomplishes a row wise shuffle.
Shuffle column-wise:
> df3 <- df1[,sample(ncol(df1))] > df3 c a b 1 0 1 1 2 0 1 0 3 0 0 1 4 0 0 0
This is another way to shuffle the data.frame
using package dplyr
:
row-wise:
df2 <- slice(df1, sample(1:n()))
or
df2 <- sample_frac(df1, 1L)
column-wise:
df2 <- select(df1, one_of(sample(names(df1))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With