Filter items that only occurs once in a very large list

Question

I have a large list(over 1,000,000 items), which contains english words:

tokens = ["today", "good", "computer", "people", "good", ... ]

I'd like to get all the items that occurs only once in the list

now I'm using:

tokens_once = set(word for word in set(tokens) if tokens.count(word) == 1)

but it's really slow. how could I make this faster?

eumiro · Accepted Answer

You iterate over a list and then for each element you do it again, which makes it O(N²). If you replace your count by a Counter, you iterate once over the list and then once again over the list of unique elements, which makes it, in the worst case, O(2N), i.e. O(N).

from collections import Counter

tokens = ["today", "good", "computer", "people", "good"]
single_tokens = [k for k, v in Counter(tokens).iteritems() if v == 1 ]
# single_tokens == ['today', 'computer', 'people']

Filter items that only occurs once in a very large list

Tags:

performance

python

algorithm

list

filter

wong2

1 Answers

eumiro

Recent Activity

Donate For Us

Filter items that only occurs once in a very large list

Tags:

performance

python

algorithm

list

filter

wong2

1 Answers

eumiro

Related questions

Recent Activity

Donate For Us