I have a large list(over 1,000,000 items), which contains english words:
tokens = ["today", "good", "computer", "people", "good", ... ]
I'd like to get all the items that occurs only once in the list
now I'm using:
tokens_once = set(word for word in set(tokens) if tokens.count(word) == 1)
but it's really slow. how could I make this faster?
You iterate over a list and then for each element you do it again, which makes it O(N²). If you replace your count
by a Counter
, you iterate once over the list and then once again over the list of unique elements, which makes it, in the worst case, O(2N), i.e. O(N).
from collections import Counter
tokens = ["today", "good", "computer", "people", "good"]
single_tokens = [k for k, v in Counter(tokens).iteritems() if v == 1 ]
# single_tokens == ['today', 'computer', 'people']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With