Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Permute a vector such that an element can't be in the same place

Tags:

algorithm

r

I want to permute a vector so that an element can't be in the same place after permutation, as it was in the original. Let's say I have a list of elements like this: AABBCCADEF

A valid shuffle would be: BBAADEFCCA

But these would be invalid: BAACFEDCAB or BCABFEDCAB

The closest answer I could find was this: python shuffle such that position will never repeat. But that's not quite what I want, because there are no repeated elements in that example.

I want a fast algorithm that generalizes that answer in the case of repetitions.

MWE:

library(microbenchmark)

set.seed(1)
x <- sample(letters, size=295, replace=T)

terrible_implementation <- function(x) {
  xnew <- sample(x)
  while(any(x == xnew)) {
    xnew <- sample(x)
  }
  return(xnew)
}

microbenchmark(terrible_implementation(x), times=10)


Unit: milliseconds
                       expr      min       lq    mean  median       uq      max neval
 terrible_implementation(x) 479.5338 2346.002 4738.49 2993.29 4858.254 17005.05    10

Also, how do I determine if a sequence can be permuted in such a way?

EDIT: To make it perfectly clear what I want, the new vector should satisfy the following conditions:

1) all(table(newx) == table(x)) 2) all(x != newx)

E.g.:

newx <- terrible_implementation(x)
all(table(newx) == table(x))
[1] TRUE
all(x != newx)
[1] TRUE
like image 461
thc Avatar asked Nov 09 '17 00:11

thc


2 Answers

#DATA
set.seed(1)
x <- sample(letters, size=295, replace=T)

foo = function(S){
    if(max(table(S)) > length(S)/2){
        stop("NOT POSSIBLE")
    }
    U = unique(S)
    done_chrs = character(0)
    inds = integer(0)
    ans = character(0)
    while(!identical(sort(done_chrs), sort(U))){
        my_chrs = U[!U %in% done_chrs]
        next_chr = my_chrs[which.min(sapply(my_chrs, function(x) length(setdiff(which(!S %in% x), inds))))]
        x_inds = which(S %in% next_chr)
        candidates = setdiff(seq_along(S), union(x_inds, inds))
        if (length(candidates) == 1){
            new_inds = candidates
        }else{
            new_inds = sample(candidates, length(x_inds))
        }
        inds = c(inds, new_inds)
        ans[new_inds] = next_chr
        done_chrs = c(done_chrs, next_chr)
    }
    return(ans)
}

ans_foo = foo(x)

identical(sort(ans_foo), sort(x)) & !any(ans_foo == x)
#[1] TRUE

library(microbenchmark)
microbenchmark(foo(x))
#Unit: milliseconds
#   expr      min       lq     mean   median       uq      max neval
# foo(x) 19.49833 22.32517 25.65675 24.85059 27.96838 48.61194   100
like image 97
d.b Avatar answered Nov 07 '22 21:11

d.b


I think this satisfies all your conditions. The idea is to order by the frequency, start with the most common element and shift the value to the next value in the frequency table by the number of times the most common element appears. This will guarantee all elements will be missed.

I've written in data.table, as it helped me during debugging, without losing too much performance. It's a modest improvement performance-wise.

library(data.table)
library(magrittr)
library(microbenchmark)


permute_avoid_same_position <- function(y) {
  DT <- data.table(orig = y)
  DT[, orig_order := .I]

  count_by_letter <- 
    DT[, .N, keyby = orig] %>%
    .[order(N)] %>%
    .[, stable_order := .I] %>%
    .[order(-stable_order)] %>%
    .[]

  out <- copy(DT)[count_by_letter, .(orig, orig_order, N), on = "orig"]
  # Dummy element
  out[, new := first(y)]
  origs <- out[["orig"]]
  nrow_out <- nrow(out)
  maxN <- count_by_letter[["N"]][1]

  out[seq_len(nrow_out) > maxN, new := head(origs, nrow_out - maxN)]
  out[seq_len(nrow_out) <= maxN, new := tail(origs, maxN)]

  DT[out, j = .(orig_order, orig, new), on = "orig_order"] %>%
    .[order(orig_order)] %>%
    .[["new"]]
}

set.seed(1)
x <- sample(letters, size=295, replace=T)
testthat::expect_true(all(table(permute_avoid_same_position(x)) == table(x)))
testthat::expect_true(all(x != permute_avoid_same_position(x)))
microbenchmark(permute_avoid_same_position(x), times = 5)

# Unit: milliseconds
#                           expr      min       lq     mean   median       uq      max
# permute_avoid_same_position(x) 5.650378 5.771753 5.875116 5.788618 5.938604 6.226228

x <- sample(1:1000, replace = TRUE, size = 1e6)
testthat::expect_true(all(table(permute_avoid_same_position(x)) == table(x)))
testthat::expect_true(all(x != permute_avoid_same_position(x)))

microbenchmark(permute_avoid_same_position(x), times = 5)
# Unit: milliseconds
#                           expr      min       lq    mean   median       uq      max
# permute_avoid_same_position(x) 239.7744 385.4686 401.521 438.2999 440.9746 503.0875
like image 2
Hugh Avatar answered Nov 07 '22 21:11

Hugh