So I have a vector of numbers, say 1:8000. I want generate exactly n
distinct permutations of this vector. What is a way to do this without having to calculate all permutations (since 8000! is huge and can't fit in RAM).
function distinct_permutations(L, N){
# Return N distinct permutations of L as a list of lists
return(x)
}
x <- seq(1:8000)
Using RcppAlgos::permuteSample
, here demonstrated with n=3 samples of 10 out of 10.
set.seed(42)
RcppAlgos::permuteSample(v=10, m=10, n=3)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 9 8 5 3 4 1 10 6 2 7
# [2,] 1 3 5 9 7 6 8 10 2 4
# [3,] 2 8 5 9 4 7 3 10 6 1
I will implement my first idea from 3 proposed in comments and compared with the already provided by other User. My solution is more universal as any set of values could be provided (other ones have to be treated as indexes), I provide a list of vectors and my solution is faster for many scenarios.
Current solutions:
perm_n
my own solution powered by sample
and unique
RcppAlgos::permuteSample
arrangements::permutations
EDIT: added additional method
perm_n <- function(vec, n) {
# sample 2 * N samples as a risk of duplicates for bigger samples
unique(
lapply(seq_len(n * 2), function(x) sample(vec, length(vec)))
)[seq_len(n)]
}
microbenchmark::microbenchmark(
RcppAlgos::permuteSample(v=10, m=10, n=10),
perm_n(1:10, 10),
arrangements::permutations(10, 10, nsample = 10)
)
#> Warning in microbenchmark::microbenchmark(RcppAlgos::permuteSample(v = 10, :
#> less accurate nanosecond times to avoid potential integer overflows
#> Unit: microseconds
#> expr min lq mean
#> RcppAlgos::permuteSample(v = 10, m = 10, n = 10) 230.174 249.7105 1605.4817
#> perm_n(1:10, 10) 65.559 68.0190 182.8387
#> arrangements::permutations(10, 10, nsample = 10) 3.649 4.1205 238.7746
#> median uq max neval
#> 1038.961 2292.556 28600.70 100
#> 71.217 82.861 10617.61 100
#> 4.633 7.298 23295.99 100
microbenchmark::microbenchmark(
RcppAlgos::permuteSample(v=8000, m=8000, n=10),
perm_n(1:8000, 10),
arrangements::permutations(8000, 8000, nsample = 10),
times = 10
)
#> Unit: milliseconds
#> expr min lq
#> RcppAlgos::permuteSample(v = 8000, m = 8000, n = 10) 199.5939 199.858108
#> perm_n(1:8000, 10) 4.3911 4.428697
#> arrangements::permutations(8000, 8000, nsample = 10) 593.0577 594.338993
#> mean median uq max neval
#> 200.44507 200.282540 200.944936 201.960178 10
#> 4.49205 4.457807 4.555182 4.663914 10
#> 600.67245 598.277678 604.146808 619.717132 10
microbenchmark::microbenchmark(
RcppAlgos::permuteSample(v=8000, m=8000, n=300),
perm_n(1:8000, 300),
arrangements::permutations(8000, 8000, nsample = 300),
times = 3
)
#> Unit: milliseconds
#> expr min lq
#> RcppAlgos::permuteSample(v = 8000, m = 8000, n = 300) 5939.2831 5965.0268
#> perm_n(1:8000, 300) 136.7174 137.3693
#> arrangements::permutations(8000, 8000, nsample = 300) 17875.7788 17877.2049
#> mean median uq max neval
#> 5984.365 5990.7704 6006.9057 6023.0410 3
#> 138.048 138.0212 138.7133 139.4053 3
#> 17891.089 17878.6310 17898.7438 17918.8566 3
Created on 2022-11-23 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With