I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example
g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
The output should be, naturally according to my question:
g = [[1,2,3],[9,0,1],[4,3,2]]
I've tried with set
but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.
(ab)using side-effects version of a list comp:
seen = set()
[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]
For those (unlike myself) who don't like using side-effects in this manner:
res = []
seen = set()
for x in g:
x_set = frozenset(x)
if x_set not in seen:
res.append(x)
seen.add(x_set)
The reason that you add frozenset
s to the set is that you can only add hashable objects to a set
, and vanilla set
s are not hashable.
If you don't care about the order for lists and sublists (and all items in sublists are unique):
result = set(map(frozenset, g))
If a sublist may have duplicates e.g., [1, 2, 1, 3]
then you could use tuple(sorted(sublist))
instead of frozenset(sublist)
that removes duplicates from a sublist.
If you want to preserve the order of sublists:
def del_dups(seq, key=frozenset):
seen = {}
pos = 0
for item in seq:
if key(item) not in seen:
seen[key(item)] = True
seq[pos] = item
pos += 1
del seq[pos:]
Example:
del_dups(g, key=lambda x: tuple(sorted(x)))
See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With