What is the fastest way to remove all multiple occurrence items from a list of arbitrary items (in my example a list of lists)? In the result, only items that occur a single time in the list should show up, thus removing all duplicates.
input: [[1, 2], [1, 3], [1, 4], [1, 2], [1, 4], [1, 2]]
output: [[1, 3], ]
This solution was slow:
output = [item for item in input if input.count(item)==1]
This solution was faster:
duplicates = []
output = []
for item in input:
if not item in duplicates:
if item in output:
output.remove(item)
duplicates.append(item)
else:
output.append(item)
Is there any better solution, probably by first sorting the list? Any ideas are appreciated.
If you don't care about preserving ordering:
from collections import Counter
def only_uniques(seq):
return [k for k,n in Counter(seq).iteritems() if n == 1]
If you do care about preserving ordering:
from collections import Counter
def only_uniques_ordered(seq):
counts = Counter(seq)
return [k for k in seq if counts[k] == 1]
Both algorithms run in O(n)
time.
Edit: Forgot you had a list of lists. In order to be able to hash a sequence, it needs to be immutable, so you could do something like this:
list_of_tuples = [tuple(k) for k in list_of_lists]
And then run list_of_tuples
through one of the above functions instead. Note that you'll get a list of tuples back out of it - but unless you specifically are modifying the sequences again after this, tuples should work just as well for your purposes.
If you do need to convert back, it's pretty much the same:
list_of_lists = [list(k) for k in list_of_tuples]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With