Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most time efficient way to remove unordered duplicates in a 2D array?

I've generated a list of combinations, using itertools and I'm getting a result that looks like this:

nums = [-5,5,4,-3,0,0,4,-2]
x = [x for x in set(itertools.combinations(nums, 4)) if sum(x)==target]
>>> x = [(-5, 5, 0, 4), (-5, 5, 4, 0), (5, 4, -3, -2), (5, -3, 4, -2)]

What is the most time-complexity wise efficient way of removing unordered duplicates, such as x[0] and x[1] are the duplicates. Is there anything built in to handle this?

My general approach would be to create a counter of all elements in one and compare to the next. Would this be the best approach?

Thank you for any guidance.

like image 542
Safder Avatar asked Jan 04 '20 09:01

Safder


2 Answers

Since you want to find unordered duplicates the best way to go is by typecasting. Typecast them as set. Since set only contains immutable elements. So, I made a set of tuples.

Note: The best way to eliminate duplicates is by making a set of the given elements.

>>> set(map(tuple,map(sorted,x)))
{(-3, -2, 4, 5), (-5, 0, 4, 5)}
like image 60
Ch3steR Avatar answered Nov 11 '22 18:11

Ch3steR


The best way is to not generate the duplicates in the first place.

The idea is to first create all possible combinations of values that appear multiple times, where each appears 0, 1, ... times. Then, we complete them with all possible combinations of the unique elements.

from itertools import combinations, product, chain
from collections import Counter

nums = [-5,5,4,-3,0,0,4,-2]

def combinations_without_duplicates(nums, k):
    counts = Counter(nums)
    multiples = {val: count for val, count in counts.items() if count >= 2 }
    uniques = set(counts) - set(multiples)              
    possible_multiples = [[[val]*i for i in range(count+1)] for val, count in multiples.items()]
    multiples_part = (list(chain(*x)) for x in product(*possible_multiples))
    # omit the ones that are too long
    multiples_part = (lst for lst in multiples_part if len(lst) <= k)
    # Would be at this point:
    # [[], [0], [0, 0], [4], [4, 0], [4, 0, 0], [4, 4], [4, 4, 0], [4, 4, 0, 0]]
    for m_part in multiples_part:
        missing = k - len(m_part)
        for c in combinations(uniques, missing):
            yield m_part + list(c)


list(combinations_without_duplicates(nums, 4))

Output:

[[-3, -5, 5, -2],
 [0, -3, -5, 5],
 [0, -3, -5, -2],
 [0, -3, 5, -2],
 [0, -5, 5, -2],
 [0, 0, -3, -5],
 [0, 0, -3, 5],
 [0, 0, -3, -2],
 [0, 0, -5, 5],
 [0, 0, -5, -2],
 [0, 0, 5, -2],
 [4, -3, -5, 5],
 [4, -3, -5, -2],
 [4, -3, 5, -2],
 [4, -5, 5, -2],
 [4, 0, -3, -5],
 [4, 0, -3, 5],
 [4, 0, -3, -2],
 [4, 0, -5, 5],
 [4, 0, -5, -2],
 [4, 0, 5, -2],
 [4, 0, 0, -3],
 [4, 0, 0, -5],
 [4, 0, 0, 5],
 [4, 0, 0, -2],
 [4, 4, -3, -5],
 [4, 4, -3, 5],
 [4, 4, -3, -2],
 [4, 4, -5, 5],
 [4, 4, -5, -2],
 [4, 4, 5, -2],
 [4, 4, 0, -3],
 [4, 4, 0, -5],
 [4, 4, 0, 5],
 [4, 4, 0, -2],
 [4, 4, 0, 0]]
like image 29
Thierry Lathuille Avatar answered Nov 11 '22 20:11

Thierry Lathuille