Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how do I locally shuffle a vector's elements

Tags:

r

sample

I have the following vector in R. Think of them as a vector of numbers.

x = c(1,2,3,4,...100)

I want to randomize this vector "locally" based on some input number the "locality factor". For example if the locality factor is 3, then the first 3 elements are taken and randomized followed by the next 3 elements and so on. Is there an efficient way to do this? I know if I use sample, it would jumble up the whole array. Thanks in advance

like image 777
broccoli Avatar asked Jul 14 '13 15:07

broccoli


People also ask

How do I shuffle in R?

We can shuffle the rows in the dataframe by using sample() function. By providing indexing to the dataframe the required task can be easily achieved. Where. sample() function is used to shuffle the rows that takes a parameter with a function called nrow() with a slice operator to get all rows shuffled.

How do you access the elements of a vector in R?

Vector elements are accessed using indexing vectors, which can be numeric, character or logical vectors. You can access an individual element of a vector by its position (or "index"), indicated using square brackets. In R, the first element has an index of 1. To get the 7th element of the colors vector: colors[7] .

How do you rearrange vectors in R?

To sort a vector in R programming, call sort() function and pass the vector as argument to this function. sort() function returns the sorted vector in increasing order. The default sorting order is increasing order. We may sort in decreasing order using rev() function on the output returned by sort().


2 Answers

Arun didn't like how inefficient my other answer was, so here's something very fast just for him ;)

It requires just one call each to runif() and order(), and doesn't use sample() at all.

x <- 1:100
k <- 3
n <- length(x)

x[order(rep(seq_len(ceiling(n/k)), each=k, length.out=n) + runif(n))]
#  [1]   3   1   2   6   5   4   8   9   7  11  12  10  13  14  15  18  16  17
# [19]  20  19  21  23  22  24  27  25  26  29  28  30  33  31  32  36  34  35
# [37]  37  38  39  40  41  42  43  44  45  47  48  46  51  49  50  52  54  53
# [55]  55  57  56  58  60  59  62  63  61  66  64  65  68  67  69  71  70  72
# [73]  75  74  73  76  77  78  81  80  79  84  82  83  86  85  87  89  88  90
# [91]  93  92  91  94  96  95  97  98  99 100
like image 121
Josh O'Brien Avatar answered Sep 28 '22 08:09

Josh O'Brien


General solution:

Edit: As @MatthewLundberg comments, the issue I pointed out with "repeating numbers in x" can be easily overcome by working on seq_along(x), which would mean the resulting values will be indices. So, it'd be like so:

k <- 3
x <- c(2,2,1, 1,3,4, 4,6,5, 3)
x.s <- seq_along(x)
y <- sample(x.s)
x[unlist(split(y, (match(y, x.s)-1) %/% k), use.names = FALSE)]
# [1] 2 2 1 3 4 1 4 5 6 3

Old answer:

The bottleneck here is the amount of calls to function sample. And as long as your numbers don't repeat, I think you can do this with just one call to sample in this manner:

k <- 3
x <- 1:20
y <- sample(x)
unlist(split(y, (match(y,x)-1) %/% k), use.names = FALSE)
# [1]  1  3  2  5  6  4  8  9  7 12 10 11 13 14 15 17 16 18 19 20

To put everything together in a function (I like the name scramble from @Roland's):

scramble <- function(x, k=3) {
    x.s <- seq_along(x)
    y.s <- sample(x.s)
    idx <- unlist(split(y.s, (match(y.s, x.s)-1) %/% k), use.names = FALSE)
    x[idx]
}

scramble(x, 3)
# [1] 2 1 2 3 4 1 5 4 6 3
scramble(x, 3)
# [1] 1 2 2 1 4 3 6 5 4 3

To reduce the answer (and get it faster) even more, following @flodel's comment:

scramble <- function(x, k=3L) {
    x.s <- seq_along(x)
    y.s <- sample(x.s)
    x[unlist(split(x.s[y.s], (y.s-1) %/% k), use.names = FALSE)]
}
like image 30
Arun Avatar answered Sep 28 '22 07:09

Arun