Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to remove all multiple occurrence items from a list?

Tags:

python

What is the fastest way to remove all multiple occurrence items from a list of arbitrary items (in my example a list of lists)? In the result, only items that occur a single time in the list should show up, thus removing all duplicates.

input: [[1, 2], [1, 3], [1, 4], [1, 2], [1, 4], [1, 2]]

output: [[1, 3], ]

This solution was slow:

output = [item for item in input if input.count(item)==1]

This solution was faster:

duplicates = []
output = []
for item in input:
    if not item in duplicates:
        if item in output:
            output.remove(item)
            duplicates.append(item)
        else:
           output.append(item)

Is there any better solution, probably by first sorting the list? Any ideas are appreciated.

like image 869
Meilo Avatar asked Dec 26 '22 07:12

Meilo


1 Answers

If you don't care about preserving ordering:

from collections import Counter

def only_uniques(seq):
    return [k for k,n in Counter(seq).iteritems() if n == 1]

If you do care about preserving ordering:

from collections import Counter

def only_uniques_ordered(seq):
    counts = Counter(seq)
    return [k for k in seq if counts[k] == 1]

Both algorithms run in O(n) time.


Edit: Forgot you had a list of lists. In order to be able to hash a sequence, it needs to be immutable, so you could do something like this:

list_of_tuples = [tuple(k) for k in list_of_lists]

And then run list_of_tuples through one of the above functions instead. Note that you'll get a list of tuples back out of it - but unless you specifically are modifying the sequences again after this, tuples should work just as well for your purposes.

If you do need to convert back, it's pretty much the same:

list_of_lists = [list(k) for k in list_of_tuples]
like image 165
Amber Avatar answered May 12 '23 21:05

Amber