In R, how do I locally shuffle a vector's elements

Tags:

sample

I have the following vector in R. Think of them as a vector of numbers.

x = c(1,2,3,4,...100)

I want to randomize this vector "locally" based on some input number the "locality factor". For example if the locality factor is 3, then the first 3 elements are taken and randomized followed by the next 3 elements and so on. Is there an efficient way to do this? I know if I use sample, it would jumble up the whole array. Thanks in advance

777

asked Jul 14 '13 15:07

broccoli

2 Answers

Arun didn't like how inefficient my other answer was, so here's something very fast just for him ;)

It requires just one call each to runif() and order(), and doesn't use sample() at all.

x <- 1:100
k <- 3
n <- length(x)

x[order(rep(seq_len(ceiling(n/k)), each=k, length.out=n) + runif(n))]
#  [1]   3   1   2   6   5   4   8   9   7  11  12  10  13  14  15  18  16  17
# [19]  20  19  21  23  22  24  27  25  26  29  28  30  33  31  32  36  34  35
# [37]  37  38  39  40  41  42  43  44  45  47  48  46  51  49  50  52  54  53
# [55]  55  57  56  58  60  59  62  63  61  66  64  65  68  67  69  71  70  72
# [73]  75  74  73  76  77  78  81  80  79  84  82  83  86  85  87  89  88  90
# [91]  93  92  91  94  96  95  97  98  99 100

121

answered Sep 28 '22 08:09

Josh O'Brien

General solution:

Edit: As @MatthewLundberg comments, the issue I pointed out with "repeating numbers in x" can be easily overcome by working on seq_along(x), which would mean the resulting values will be indices. So, it'd be like so:

k <- 3
x <- c(2,2,1, 1,3,4, 4,6,5, 3)
x.s <- seq_along(x)
y <- sample(x.s)
x[unlist(split(y, (match(y, x.s)-1) %/% k), use.names = FALSE)]
# [1] 2 2 1 3 4 1 4 5 6 3

Old answer:

The bottleneck here is the amount of calls to function sample. And as long as your numbers don't repeat, I think you can do this with just one call to sample in this manner:

k <- 3
x <- 1:20
y <- sample(x)
unlist(split(y, (match(y,x)-1) %/% k), use.names = FALSE)
# [1]  1  3  2  5  6  4  8  9  7 12 10 11 13 14 15 17 16 18 19 20

To put everything together in a function (I like the name scramble from @Roland's):

scramble <- function(x, k=3) {
    x.s <- seq_along(x)
    y.s <- sample(x.s)
    idx <- unlist(split(y.s, (match(y.s, x.s)-1) %/% k), use.names = FALSE)
    x[idx]
}

scramble(x, 3)
# [1] 2 1 2 3 4 1 5 4 6 3
scramble(x, 3)
# [1] 1 2 2 1 4 3 6 5 4 3

To reduce the answer (and get it faster) even more, following @flodel's comment:

scramble <- function(x, k=3L) {
    x.s <- seq_along(x)
    y.s <- sample(x.s)
    x[unlist(split(x.s[y.s], (y.s-1) %/% k), use.names = FALSE)]
}

answered Sep 28 '22 07:09

Arun

Related questions
                            
                                custom function after grouping data.fame
                            
                                Distance matrix to pairwise distance list in R
                            
                                how do you put text on different lines in ggplot
                            
                                R data.table group by multiple columns into 1 column and sum
                            
                                activate tabpanel from another tabpanel
                            
                                How do I get the shortest route in a labyrinth?
                            
                                Print "pretty" tables for h2o models in R
                            
                                Midpoint of discrete diverging scale in ggplot2
                            
                                Method for calculating distance between all points in a dataframe containing a list of xy coordinates
                            
                                How do I specify a dynamic position for the start of substring?
                            
                                How do I compute the number of occurrences of a particular value in a row in R
                            
                                subset data frame based on percentage
                            
                                Appending data in R
                            
                                Calculating wind direction from U and V components of the wind using lapply or ifelse
                            
                                R colSums By Group
                            
                                Time-series histogram
                            
                                Loading/Reading data in R taking up too much memory
                            
                                In R, match function for rows or columns of matrix
                            
                                Convert anything that's not a number to blank
                            
                                Coloring line segments in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With