I have a list of strings:
In [53]: l = ['#Trending', '#Trending', '#TrendinG', '#Yax', '#YAX', '#Yax']
In [54]: set(l)
Out[54]: {'#TrendinG', '#Trending', '#YAX', '#Yax'}
I want to have a case-insensitive set
of this list.
Expected Result:
Out[55]: {'#Trending', '#Yax'}
How can I achieve this?
Python String equals case-insensitive check Sometimes we don't care about the case while checking if two strings are equal, we can use casefold() , lower() or upper() functions for case-insensitive equality check.
A programming language is said to be case sensitive if it distinguishes between uppercase and lowercase letters. Python is a `case sensitive programming language. Variable, functions, modules and package names are written in lowercase by convention. Class names and constants are written in uppercase.
If you need to preserve case, you could use a dictionary instead. Case-fold the keys, then extract the values to a set:
set({v.casefold(): v for v in l}.values())
The str.casefold()
method uses the Unicode case folding rules (pdf) to normalize strings for case-insensitive comparisons. This is especially important for non-ASCII alphabets, and text with ligatures. E.g. the German ß
sharp S, which is normalised to ss
, or, from the same language, the s
long s:
>>> print(s := 'Waſſerſchloß', s.lower(), s.casefold(), sep=" - ")
Waſſerſchloß - waſſerſchloß - wasserschloss
You can encapsulate this into a class.
If you don't care about preserving case, just use a set comprehension:
{v.casefold() for v in l}
Note that Python 2 doesn't have this method, use str.lower()
in that case.
Demo:
>>> l = ['#Trending', '#Trending', '#TrendinG', '#Yax', '#YAX', '#Yax']
>>> set({v.casefold(): v for v in l}.values())
{'#Yax', '#TrendinG'}
>>> {v.lower() for v in l}
{'#trending', '#yax'}
Wrapping the first approach into a class would look like:
try:
# Python 3
from collections.abc import MutableSet
except ImportError:
# Python 2
from collections import MutableSet
class CasePreservingSet(MutableSet):
"""String set that preserves case but tests for containment by case-folded value
E.g. 'Foo' in CasePreservingSet(['FOO']) is True. Preserves case of *last*
inserted variant.
"""
def __init__(self, *args):
self._values = {}
if len(args) > 1:
raise TypeError(
f"{type(self).__name__} expected at most 1 argument, "
f"got {len(args)}"
)
values = args[0] if args else ()
try:
self._fold = str.casefold # Python 3
except AttributeError:
self._fold = str.lower # Python 2
for v in values:
self.add(v)
def __repr__(self):
return '<{}{} at {:x}>'.format(
type(self).__name__, tuple(self._values.values()), id(self))
def __contains__(self, value):
return self._fold(value) in self._values
def __iter__(self):
try:
# Python 2
return self._values.itervalues()
except AttributeError:
# Python 3
return iter(self._values.values())
def __len__(self):
return len(self._values)
def add(self, value):
self._values[self._fold(value)] = value
def discard(self, value):
try:
del self._values[self._fold(value)]
except KeyError:
pass
Usage demo:
>>> cps = CasePreservingSet(l)
>>> cps
<CasePreservingSet('#TrendinG', '#Yax') at 1047ba290>
>>> '#treNdinG' in cps
True
You can use lower()
:
>>> set(i.lower() for i in l)
set(['#trending', '#yax'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With