Removing some of the duplicates from a list in Python

Question

I would like to remove a certain number of duplicates of a list without removing all of them. For example, I have a list [1,2,3,4,4,4,4,4] and I want to remove 3 of the 4's, so that I am left with [1,2,3,4,4]. A naive way to do it would probably be

def remove_n_duplicates(remove_from, what, how_many):
    for j in range(how_many):
        remove_from.remove(what)

Is there a way to do remove the three 4's in one pass through the list, but keep the other two.

mgilson · Accepted Answer

If you just want to remove the first n occurrences of something from a list, this is pretty easy to do with a generator:

def remove_n_dupes(remove_from, what, how_many):
    count = 0
    for item in remove_from:
        if item == what and count < how_many:
            count += 1
        else:
            yield item

Usage looks like:

lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3))  # [1, 2, 3, 4, 4]

Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:

from collections import Counter
def keep_n_dupes(remove_from, how_many):
    counts = Counter()
    for item in remove_from:
        counts[item] += 1
        if counts[item] <= how_many:
            yield item

Usage is similar:

lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2))  # [1, 1, 2, 3, 4, 4]

Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...

Removing some of the duplicates from a list in Python

Tags:

python

list

duplicates

Jacob Bond

1 Answers

mgilson

Recent Activity

Donate For Us

Removing some of the duplicates from a list in Python

Tags:

python

list

duplicates

Jacob Bond

1 Answers

mgilson

Related questions

Recent Activity

Donate For Us