Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Labeling duplicates in a list

Say I have a list of names in python, such as the following:

names = ['Alice','Bob','Carl','Dave','Bob','Earl','Carl','Frank','Carl']

Now, I want to get rid of the fact that there are duplicate names in this list, but I don't want to remove them. Instead, for each name that appears more than once in this list, I want to append a suffix to that name, where the suffix is the n-th time the name has appeared, while preserving the order of the list. Since there are 3 Carls in the list, I want to be able to refer to them as Carl_1, Carl_2, and Carl_3 respectively. So in this case the desired output is as follows:

names = ['Alice','Bob_1','Carl_1','Dave','Bob_2','Earl','Carl_2','Frank','Carl_3']

I can do this by looping through the list and modifying each name if it needs to be modified, for example with something like the following code.

def mark_duplicates(name_list):
    output = []
    duplicates = {}
    for name in name_list:
        if name_list.count(name) = 1:
            output.append(name)
        else:
            if name in duplicates:
                duplicates['name'] += 1
            else:
                duplicates['name'] = 1
            output.append(name + "_" + str(duplicates['name']))
    return output

However this is a lot of work and a lot of lines of code for something that I suspect shouldn't be very hard to do. Is there a simpler way to accomplish what I want to do? For example, using something such as list comprehension or a package like itertools or something?

like image 925
K. Mao Avatar asked Oct 25 '16 20:10

K. Mao


People also ask

Can you have duplicate values in a list?

What are duplicates in a list? If an integer or string or any items in a list are repeated more than one time, they are duplicates.

How do you add duplicates to a list in Python?

Method #1 : Using * operator We can employ * operator to multiply the occurrence of the particular value and hence can be used to perform this task of adding value multiple times in just a single line and makes it readable. # adds 3, 50 times.


1 Answers

collections.Counter can help cut down on the bookkeeping a bit:

In [106]: out = []

In [107]: fullcount = Counter(names)

In [108]: nc = Counter()

In [109]: for n in names:
     ...:     nc[n] += 1
     ...:     out.append(n if fullcount[n] == 1 else '{}_{}'.format(n, nc[n]))
     ...:

In [110]: out
Out[110]:
['Alice', 'Bob_1', 'Carl_1', 'Dave', 'Bob_2', 'Earl', 'Carl_2', 'Frank', 'Carl_3']
like image 116
Randy Avatar answered Oct 26 '22 21:10

Randy