Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get Case Insensitive Python SET

Tags:

python

I have a list of strings:

In [53]: l = ['#Trending', '#Trending', '#TrendinG', '#Yax', '#YAX', '#Yax']

In [54]: set(l)
Out[54]: {'#TrendinG', '#Trending', '#YAX', '#Yax'}

I want to have a case-insensitive set of this list.

Expected Result:

Out[55]: {'#Trending', '#Yax'}

How can I achieve this?

like image 722
Yax Avatar asked Dec 17 '14 17:12

Yax


People also ask

How do you do a case-insensitive check in Python?

Python String equals case-insensitive check Sometimes we don't care about the case while checking if two strings are equal, we can use casefold() , lower() or upper() functions for case-insensitive equality check.

Is in case-sensitive Python?

A programming language is said to be case sensitive if it distinguishes between uppercase and lowercase letters. Python is a `case sensitive programming language. Variable, functions, modules and package names are written in lowercase by convention. Class names and constants are written in uppercase.


2 Answers

If you need to preserve case, you could use a dictionary instead. Case-fold the keys, then extract the values to a set:

 set({v.casefold(): v for v in l}.values())

The str.casefold() method uses the Unicode case folding rules (pdf) to normalize strings for case-insensitive comparisons. This is especially important for non-ASCII alphabets, and text with ligatures. E.g. the German ß sharp S, which is normalised to ss, or, from the same language, the s long s:

>>> print(s := 'Waſſerſchloß', s.lower(), s.casefold(), sep=" - ")
Waſſerſchloß - waſſerſchloß - wasserschloss

You can encapsulate this into a class.

If you don't care about preserving case, just use a set comprehension:

{v.casefold() for v in l}

Note that Python 2 doesn't have this method, use str.lower() in that case.

Demo:

>>> l = ['#Trending', '#Trending', '#TrendinG', '#Yax', '#YAX', '#Yax']
>>> set({v.casefold(): v for v in l}.values())
{'#Yax', '#TrendinG'}
>>> {v.lower() for v in l}
{'#trending', '#yax'}

Wrapping the first approach into a class would look like:

try:
    # Python 3
    from collections.abc import MutableSet
except ImportError:
    # Python 2
    from collections import MutableSet

class CasePreservingSet(MutableSet):
    """String set that preserves case but tests for containment by case-folded value

    E.g. 'Foo' in CasePreservingSet(['FOO']) is True. Preserves case of *last*
    inserted variant.

    """
    def __init__(self, *args):
        self._values = {}
        if len(args) > 1:
            raise TypeError(
                f"{type(self).__name__} expected at most 1 argument, "
                f"got {len(args)}"
            )
        values = args[0] if args else ()
        try:
            self._fold = str.casefold  # Python 3
        except AttributeError:
            self._fold = str.lower     # Python 2
        for v in values:
            self.add(v)

    def __repr__(self):
        return '<{}{} at {:x}>'.format(
            type(self).__name__, tuple(self._values.values()), id(self))

    def __contains__(self, value):
        return self._fold(value) in self._values

    def __iter__(self):
        try:
            # Python 2
            return self._values.itervalues()
        except AttributeError:
            # Python 3
            return iter(self._values.values())

    def __len__(self):
        return len(self._values)

    def add(self, value):
        self._values[self._fold(value)] = value

    def discard(self, value):
        try:
            del self._values[self._fold(value)]
        except KeyError:
            pass

Usage demo:

>>> cps = CasePreservingSet(l)
>>> cps
<CasePreservingSet('#TrendinG', '#Yax') at 1047ba290>
>>> '#treNdinG' in cps
True
like image 61
Martijn Pieters Avatar answered Oct 18 '22 09:10

Martijn Pieters


You can use lower() :

>>> set(i.lower() for i in l)
set(['#trending', '#yax'])
like image 30
Mazdak Avatar answered Oct 18 '22 08:10

Mazdak