I am using permutations from the Combinatorics library on a list with many repeated values. My issue is that permutations is creating all permutations, leading to overflow, even though many of the permutations are identical.
julia> collect(permutations([1, 1, 2, 2], 4))
24-element Array{Array{Int64,1},1}:
[1, 1, 2, 2]
[1, 1, 2, 2]
[1, 2, 1, 2]
[1, 2, 2, 1]
[1, 2, 1, 2]
[1, 2, 2, 1]
[1, 1, 2, 2]
[1, 1, 2, 2]
[1, 2, 1, 2]
[1, 2, 2, 1]
[1, 2, 1, 2]
[1, 2, 2, 1]
[2, 1, 1, 2]
[2, 1, 2, 1]
[2, 1, 1, 2]
[2, 1, 2, 1]
[2, 2, 1, 1]
[2, 2, 1, 1]
[2, 1, 1, 2]
[2, 1, 2, 1]
[2, 1, 1, 2]
[2, 1, 2, 1]
[2, 2, 1, 1]
[2, 2, 1, 1]
Lots of identical values. What I really want are only the unique permutations, without needing to first generate all permutations:
julia> unique(collect(permutations([1, 1, 2, 2], 4)))
6-element Array{Array{Int64,1},1}:
[1, 1, 2, 2]
[1, 2, 1, 2]
[1, 2, 2, 1]
[2, 1, 1, 2]
[2, 1, 2, 1]
[2, 2, 1, 1]
I could see the argument that permutations should always return all permutations, whether unique or not, but is there a way to generate only the unique permutations so I don't run out of memory?
There is a multiset_permutation in the package Combinatorics:
julia> for p in multiset_permutations([1,1,2,2],4) p|>println end
[1, 1, 2, 2]
[1, 2, 1, 2]
[1, 2, 2, 1]
[2, 1, 1, 2]
[2, 1, 2, 1]
[2, 2, 1, 1]
Going through unique can be prohibitive even for vectors of relatively small size (e.g. 14 is I think already problematic). In such cases you can consider something like this:
using Combinatorics, StatsBase
function trans(x, v::Dict{T, Int}, l) where T
z = collect(1:l)
idxs = Vector{Int}[]
for k in x
push!(idxs, z[k])
deleteat!(z, k)
end
res = Vector{T}(undef, l)
for (j, k) in enumerate(keys(v))
for i in idxs[j]
res[i] = k
end
end
res
end
function myperms(x)
v = countmap(x)
s = Int[length(x)]
for (k,y) in v
l = s[end]-y
l > 0 && push!(s, l)
end
iter = Iterators.product((combinations(1:s[i], vv) for (i, vv) in enumerate(values(v)))...)
(trans(z, v, length(x)) for z in iter)
end
(this is a quick writeup so the code quality is not production grade - in terms of style and squeezing out maximum performance, but I hope it gives you the idea how this can be approached)
This gives you a generator of unique permutations taking into account duplicates. It is reasonably fast:
julia> x = [fill(1, 7); fill(2, 7)]
14-element Array{Int64,1}:
1
1
1
1
1
1
1
2
2
2
2
2
2
2
julia> @time length(collect(myperms(x)))
0.002902 seconds (48.08 k allocations: 4.166 MiB)
3432
While this operation for unique(permutations(x)) would not terminate in any reasonable size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With