Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing some of the duplicates from a list in Python

I would like to remove a certain number of duplicates of a list without removing all of them. For example, I have a list [1,2,3,4,4,4,4,4] and I want to remove 3 of the 4's, so that I am left with [1,2,3,4,4]. A naive way to do it would probably be

def remove_n_duplicates(remove_from, what, how_many):
    for j in range(how_many):
        remove_from.remove(what)

Is there a way to do remove the three 4's in one pass through the list, but keep the other two.

like image 329
Jacob Bond Avatar asked Jul 26 '16 20:07

Jacob Bond


1 Answers

If you just want to remove the first n occurrences of something from a list, this is pretty easy to do with a generator:

def remove_n_dupes(remove_from, what, how_many):
    count = 0
    for item in remove_from:
        if item == what and count < how_many:
            count += 1
        else:
            yield item

Usage looks like:

lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3))  # [1, 2, 3, 4, 4]

Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:

from collections import Counter
def keep_n_dupes(remove_from, how_many):
    counts = Counter()
    for item in remove_from:
        counts[item] += 1
        if counts[item] <= how_many:
            yield item

Usage is similar:

lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2))  # [1, 1, 2, 3, 4, 4]

Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...

like image 131
mgilson Avatar answered Oct 01 '22 06:10

mgilson