Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python generators duplicates

Tags:

python

How do I either avoid adding duplicate entries into a generator or remove them once there are already there?

If I should be using something else, please advice.

like image 817
cynicaljoy Avatar asked Nov 19 '10 22:11

cynicaljoy


2 Answers

If the values are hashable, the simplest, dumbest way to remove duplicates is to use a set:

values = mygenerator()
unique_values = set(values)

But watch out: sets don't remember what order the values were originally in. So this scrambles the sequence.

The function below might be better than set for your purpose. It filters out duplicates without getting any of the other values out of order:

def nub(it):
    seen = set()
    for x in it:
        if x not in seen:
            yield x
            seen.add(x)

Call nub with one argument, any iterable of hashable values. It returns an iterator that produces all the same items, but with the duplicates removed.

like image 126
Jason Orendorff Avatar answered Nov 01 '22 22:11

Jason Orendorff


itertools.groupby() can collapse adjacent duplicates if you're willing to do a bit of work.

print [x[0] for x in itertools.groupby([1, 2, 2, 3])]
like image 35
Ignacio Vazquez-Abrams Avatar answered Nov 01 '22 21:11

Ignacio Vazquez-Abrams