Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Rename duplicates in list with progressive numbers without sorting list

My solution with map and lambda:

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

More traditional form

newlist = []
for i, v in enumerate(mylist):
    totalcount = mylist.count(v)
    count = mylist[:i].count(v)
    newlist.append(v + str(count + 1) if totalcount > 1 else v)

And last one

[v + str(mylist[:i].count(v) + 1) if mylist.count(v) > 1 else v for i, v in enumerate(mylist)]

This is how I would do it. EDIT: I wrote this into a more generalized utility function since people seem to like this answer.

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
check = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]
copy = mylist[:]  # so we will only mutate the copy in case of failure

from collections import Counter # Counter counts the number of occurrences of each item
from itertools import tee, count

def uniquify(seq, suffs = count(1)):
    """Make all the items unique by adding a suffix (1, 2, etc).

    `seq` is mutable sequence of strings.
    `suffs` is an optional alternative suffix iterable.
    """
    not_unique = [k for k,v in Counter(seq).items() if v>1] # so we have: ['name', 'zip']
    # suffix generator dict - e.g., {'name': <my_gen>, 'zip': <my_gen>}
    suff_gens = dict(zip(not_unique, tee(suffs, len(not_unique))))  
    for idx,s in enumerate(seq):
        try:
            suffix = str(next(suff_gens[s]))
        except KeyError:
            # s was unique
            continue
        else:
            seq[idx] += suffix

uniquify(copy)
assert copy==check  # raise an error if we failed
mylist = copy  # success

If you wanted to append an underscore before each count, you could do something like this:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> uniquify(mylist, (f'_{x!s}' for x in range(1, 100)))
>>> mylist
['name_1', 'state', 'name_2', 'city', 'name_3', 'zip_1', 'zip_2']

...or if you wanted to use letters instead:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
>>> import string
>>> uniquify(mylist, (f'_{x!s}' for x in string.ascii_lowercase))
>>> mylist
['name_a', 'state', 'name_b', 'city', 'name_c', 'zip_a', 'zip_b']

NOTE: this is not the fastest possible algorithm; for that, refer to the answer by ronakg. The advantage of the function above is it is easy to understand and read, and you're not going to see much of a performance difference unless you have an extremely large list.

EDIT: Here is my original answer in a one-liner, however the order is not preserved and it uses the .index method, which is extremely suboptimal (as explained in the answer by DTing). See the answer by queezz for a nice 'two-liner' that preserves order.

[s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]
# Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']

Any method where count is called on each element is going to result in O(n^2) since count is O(n). You can do something like this:

# not modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}
newlist = mylist[:]

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        newlist[i] += str(counts[item])
        counts[item]-=1
print(newlist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

# modifying original list
from collections import Counter

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
counts = {k:v for k,v in Counter(mylist).items() if v > 1}      

for i in reversed(range(len(mylist))):
    item = mylist[i]
    if item in counts and counts[item]:
        mylist[i] += str(counts[item])
        counts[item]-=1
print(mylist)

# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

This should be O(n).

Other provided answers:

mylist.index(s) per element causes O(n^2)

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]

from collections import Counter
counts = Counter(mylist)
for s,num in counts.items():
    if num > 1:
        for suffix in range(1, num + 1):
            mylist[mylist.index(s)] = s + str(suffix) 

count(x[1]) per element causes O(n^2)
It is also used multiple times per element along with list slicing.

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

Benchmarks:

http://nbviewer.ipython.org/gist/dting/c28fb161de7b6287491b


Here's a very simple O(n) solution. Simply walk the list storing the index of element in the list. If we've seen this element before, use the stored data earlier to append the occurrence value.

This approach solves the problem with just creating one more dictionary for look-back. Avoids doing look-ahead so that we don't create temporary list slices.

mylist = ["name", "state", "name", "city", "city", "name", "zip", "zip", "name"]

dups = {}

for i, val in enumerate(mylist):
    if val not in dups:
        # Store index of first occurrence and occurrence value
        dups[val] = [i, 1]
    else:
        # Special case for first occurrence
        if dups[val][1] == 1:
            mylist[dups[val][0]] += str(dups[val][1])

        # Increment occurrence value, index value doesn't matter anymore
        dups[val][1] += 1

        # Use stored occurrence value
        mylist[i] += str(dups[val][1])

print mylist

# ['name1', 'state', 'name2', 'city1', 'city2', 'name3', 'zip1', 'zip2', 'name4']

A list comprehension version of the Rick Teachey answer, "two-liner":

from collections import Counter

m = ["name", "state", "name", "city", "name", "zip", "zip"]

d = {a:list(range(1, b+1)) if b>1 else '' for a,b in Counter(m).items()}
[i+str(d[i].pop(0)) if len(d[i]) else i for i in m]
#['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

You can use hashtable to solve this problem. Define a dictionary d. key is the string and value is (first_time_index_in_the_list, times_of_appearance). Everytime when you see a word, just check the dictionary, and if the value is 2, use first_time_index_in_the_list to append '1' to the first element, and append times_of_appearance to current element. If greater than 2, just append times_of_appearance to current element.


Less fancy stuff.

from collections import defaultdict
mylist = ["name", "state", "name", "city", "name", "zip", "zip"]
finalList = []
dictCount = defaultdict(int)
anotherDict = defaultdict(int)
for t in mylist:
   anotherDict[t] += 1
for m in mylist:
   dictCount[m] += 1
   if anotherDict[m] > 1:
       finalList.append(str(m)+str(dictCount[m]))
   else:
       finalList.append(m)
print finalList