I have a list of strings such as:
myList = ["paper", "Plastic", "aluminum", "PAPer", "tin", "glass", "tin", "PAPER", "Polypropylene Plastic"]
I want this outcome (and this is the only acceptable outcome):
myList = ["paper", "Plastic", "aluminum", "tin", "glass", "Polypropylene Plastic"]
Note that if an item ("Polypropylene Plastic"
) happens to contain another item ("Plastic"
), I would still like to retain both items. So, the cases can be different, but the item must be a letter-for-letter match, for it to be removed.
The original list order must be retained. All duplicates after the first instance of that item should be removed. The original case of that first instance should be preserved, as well as the original cases of all non-duplicate items.
I've searched and only found questions that address one need or the other, not both.
If you want to preserve the order while you remove duplicate elements from List in Python, you can use the OrderedDict class from the collections module. More specifically, we can use OrderedDict. fromkeys(list) to obtain a dictionary having duplicate elements removed, while still maintaining order.
If you have Kutools for Excel, with its Select Duplicate & Unique Cells utility, you can quickly select or highlight the duplicate values that are case sensitive and then remove them at once.
Remove duplicates from list using Set. To remove the duplicates from a list, you can make use of the built-in function set(). The specialty of set() method is that it returns distinct elements. We have a list : [1,1,2,3,2,2,4,5,6,2,1].
It's difficult to code that with a list comprehension (or at the expense of clarity) because of the accumulation/memory effect that you need to filter out duplicates.
It's also not possible to use a set
comprehension because it destroys the original order.
Classic way with a loop and an auxiliary set
where you store the lowercase version of the strings you're encountering. Store the string in the result list only if the lowercased version isn't in the set
myList = ["paper", "Plastic", "aluminum", "PAPer", "tin", "glass", "tin", "PAPER", "Polypropylene Plastic"]
result=[]
marker = set()
for l in myList:
ll = l.lower()
if ll not in marker: # test presence
marker.add(ll)
result.append(l) # preserve order
print(result)
result:
['paper', 'Plastic', 'aluminum', 'tin', 'glass', 'Polypropylene Plastic']
using .casefold()
instead of .lower()
allows to handle subtle "casing" differences in some locales (like the german double "s" in Strasse/Straße).
Edit: it is possible to do that with a list comprehension, but it's really hacky:
marker = set()
result = [not marker.add(x.casefold()) and x for x in myList if x.casefold() not in marker]
It's using and
on the None
output of set.add
to call this function (side effect in a list comprehension, rarely a good thing...), and to return x
no matter what. The main disavantages are:
casefold()
is called twice, once for testing, once for storing in the marker setimport pandas as pd
df=pd.DataFrame(myList)
df['lower']=df[0].apply(lambda x: x.lower())
df.groupby('lower',sort=0)[0].first().tolist()
output:
['paper', 'Plastic', 'aluminum', 'tin', 'glass','Polypropylene Plastic']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With