Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to eliminate duplicate list entries in Python while preserving case-sensitivity?

I'm looking for a way to remove duplicate entries from a Python list but with a twist; The final list has to be case sensitive with a preference of uppercase words.

For example, between cup and Cup I only need to keep Cup and not cup. Unlike other common solutions which suggest using lower() first, I'd prefer to maintain the string's case here and in particular I'd prefer keeping the one with the uppercase letter over the one which is lowercase..

Again, I am trying to turn this list: [Hello, hello, world, world, poland, Poland]

into this:

[Hello, world, Poland]

How should I do that?

Thanks in advance.

like image 258
stratis Avatar asked Jul 27 '14 16:07

stratis


People also ask

How do you remove duplicates from a list while preserving order in Python?

If you want to preserve the order while you remove duplicate elements from List in Python, you can use the OrderedDict class from the collections module. More specifically, we can use OrderedDict. fromkeys(list) to obtain a dictionary having duplicate elements removed, while still maintaining order.

Is remove duplicates case sensitive?

Normally, the Remove Duplicates feature in Excel can help you remove the duplicate values quickly and easily, however, this feature is not case sensitive.


1 Answers

This does not preserve the order of words, but it does produce a list of "unique" words with a preference for capitalized ones.

In [34]: words = ['Hello', 'hello', 'world', 'world', 'poland', 'Poland', ]

In [35]: wordset = set(words)

In [36]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[36]: ['world', 'Poland', 'Hello']

If you wish to preserve the order as they appear in words, then you could use a collections.OrderedDict:

In [43]: wordset = collections.OrderedDict()

In [44]: wordset = collections.OrderedDict.fromkeys(words)

In [46]: [item for item in wordset if item.istitle() or item.title() not in wordset]
Out[46]: ['Hello', 'world', 'Poland']
like image 51
unutbu Avatar answered Sep 21 '22 10:09

unutbu