Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently remove duplicates, order-agnostic, from list of lists

The following list has some duplicated sublists, with elements in different order:

l1 = [
    ['The', 'quick', 'brown', 'fox'],
    ['hi', 'there'],
    ['jumps', 'over', 'the', 'lazy', 'dog'],
    ['there', 'hi'],
    ['jumps', 'dog', 'over','lazy', 'the'],
]

How can I remove duplicates, retaining the first instance seen, to get:

l1 = [
    ['The', 'quick', 'brown', 'fox'],
    ['hi', 'there'],
    ['jumps', 'over', 'the', 'lazy', 'dog'],
]

I tried to:

[list(i) for i in set(map(tuple, l1))]

Nevertheless, I do not know if this is the fastest way of doing it for large lists, and my attempt is not working as desired. Any idea of how to remove them efficiently?

like image 362
anon Avatar asked Aug 12 '19 18:08

anon


People also ask

How do you get rid of duplicates in a sorted list?

83. Remove Duplicates from Sorted List Given the head of a sorted linked list, delete all duplicates such that each element appears only once. Return the linked list sorted as well. The number of nodes in the list is in the range [0, 300].

How to remove duplicate items or repetitions from Python lists?

Here’s a recap of the different methods you can use to remove duplicate items or repetitions from Python lists. Use the Python list method .append () to add non-repeating items to a new list. The new list contains each item in the original list exactly once and removes all repetitions.

How to remove duplicates from linked list?

The list should only be traversed once. For example if the linked list is 11->11->11->21->43->43->60 then removeDuplicates () should convert the list to 11->21->43->60. Recommended: Please solve it on “ PRACTICE ” first, before moving on to the solution. Traverse the list from the head (or start) node.

Does hashing remove duplicates from a list?

Hence, if we cast a list to a set, duplicates will be removed. However, the original order of elements is not guaranteed. The order of elements in a set is decided by hashing mechanism, which may be different than in the list.


1 Answers

This one is a little tricky. You want to key a dict off of frozen counters, but counters are not hashable in Python. For a small degradation in the asymptotic complexity, you could use sorted tuples as a substitute for frozen counters:

seen = set()
result = []
for x in l1:
    key = tuple(sorted(x))
    if key not in seen:
        result.append(x)
        seen.add(key)

The same idea in a one-liner would look like this:

[*{tuple(sorted(k)): k for k in reversed(l1)}.values()][::-1]
like image 182
wim Avatar answered Sep 23 '22 19:09

wim