I'm looking to run over a list of ids and return a list of any ids that occurred more than once. This was what I set up that is working:
singles = list(ids)
duplicates = []
while len(singles) > 0:
elem = singles.pop()
if elem in singles:
duplicates.append(elem)
But the ids list is likely to get quite long, and I realistically don't want a while loop predicated on an expensive len call if I can avoid it. (I could go the inelegant route and call len once, then just decrement it every iteration but I'd rather avoid that if I could).
The smart way to do this is to use a data structure that makes it easy and efficient, like Counter
:
>>> ids = [random.randrange(100) for _ in range(200)]
>>> from collections import Counter
>>> counts = Counter(ids)
>>> dupids = [id for id in ids if counts[id] > 1]
Building the Counter
takes O(N) time, as opposed to O(N log N) time for sorting, or O(N^2) for counting each element from scratch every time.
As a side note:
But the ids list is likely to get quite long, and I realistically don't want a while loop predicated on an expensive len call if I can avoid it.
len
is not expensive. It's constant time, and (at least on builtin types list list
) it's about as fast as a function can possibly get in Python short of doing nothing at all.
The part of your code that's expensive is calling elem in singles
inside the loop—that means for every element, you have to compare it against potentially every other element, meaning quadratic time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With