Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove unique values from a list and keep only duplicates

Tags:

python

I'm looking to run over a list of ids and return a list of any ids that occurred more than once. This was what I set up that is working:

singles = list(ids)
duplicates = []
while len(singles) > 0:
    elem = singles.pop()
    if elem in singles:
        duplicates.append(elem)

But the ids list is likely to get quite long, and I realistically don't want a while loop predicated on an expensive len call if I can avoid it. (I could go the inelegant route and call len once, then just decrement it every iteration but I'd rather avoid that if I could).

like image 998
SuperBiasedMan Avatar asked Dec 01 '22 14:12

SuperBiasedMan


1 Answers

The smart way to do this is to use a data structure that makes it easy and efficient, like Counter:

>>> ids = [random.randrange(100) for _ in range(200)]
>>> from collections import Counter
>>> counts = Counter(ids)
>>> dupids = [id for id in ids if counts[id] > 1]

Building the Counter takes O(N) time, as opposed to O(N log N) time for sorting, or O(N^2) for counting each element from scratch every time.


As a side note:

But the ids list is likely to get quite long, and I realistically don't want a while loop predicated on an expensive len call if I can avoid it.

len is not expensive. It's constant time, and (at least on builtin types list list) it's about as fast as a function can possibly get in Python short of doing nothing at all.

The part of your code that's expensive is calling elem in singles inside the loop—that means for every element, you have to compare it against potentially every other element, meaning quadratic time.

like image 56
abarnert Avatar answered Dec 06 '22 10:12

abarnert