How do I either avoid adding duplicate entries into a generator or remove them once there are already there?
If I should be using something else, please advice.
If the values are hashable, the simplest, dumbest way to remove duplicates is to use a set
:
values = mygenerator()
unique_values = set(values)
But watch out: sets don't remember what order the values were originally in. So this scrambles the sequence.
The function below might be better than set
for your purpose. It filters out duplicates without getting any of the other values out of order:
def nub(it):
seen = set()
for x in it:
if x not in seen:
yield x
seen.add(x)
Call nub
with one argument, any iterable of hashable values. It returns an iterator that produces all the same items, but with the duplicates removed.
itertools.groupby()
can collapse adjacent duplicates if you're willing to do a bit of work.
print [x[0] for x in itertools.groupby([1, 2, 2, 3])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With