Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicated lists in list of lists in Python

I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example

g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]

The output should be, naturally according to my question:

g = [[1,2,3],[9,0,1],[4,3,2]]

I've tried with set but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.

like image 450
Alejandro Sazo Avatar asked Dec 19 '22 14:12

Alejandro Sazo


2 Answers

(ab)using side-effects version of a list comp:

seen = set()

[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]

For those (unlike myself) who don't like using side-effects in this manner:

res = []
seen = set()

for x in g:
    x_set = frozenset(x)
    if x_set not in seen:
        res.append(x)
        seen.add(x_set)

The reason that you add frozensets to the set is that you can only add hashable objects to a set, and vanilla sets are not hashable.

like image 70
roippi Avatar answered Jan 01 '23 00:01

roippi


If you don't care about the order for lists and sublists (and all items in sublists are unique):

result = set(map(frozenset, g))

If a sublist may have duplicates e.g., [1, 2, 1, 3] then you could use tuple(sorted(sublist)) instead of frozenset(sublist) that removes duplicates from a sublist.

If you want to preserve the order of sublists:

def del_dups(seq, key=frozenset):
    seen = {}
    pos = 0
    for item in seq:
        if key(item) not in seen:
            seen[key(item)] = True
            seq[pos] = item
            pos += 1
    del seq[pos:]

Example:

del_dups(g, key=lambda x: tuple(sorted(x)))

See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

like image 39
jfs Avatar answered Dec 31 '22 22:12

jfs