Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Sort - Semi Ignore Case (a, aa, A, AA, b, bb, B, BB...)

Tags:

python

sorting

How to sort a list to end up with:

['a', 'aa', 'aaa', 'A', 'AA', 'AAA', 'b', 'bb', 'bbb', 'B', 'BB', 'BBB']

Assume a shuffled version of it for convenience:

['bb', 'a', 'B', 'BB', 'AAA', 'BBB', 'b', 'aa', 'aaa', 'A', 'AA', 'bbb']

I tried sorting by ignoring case:

l = sorted(l, key=lambda x: x.lower())

which results in ['a', 'A', 'aa', 'AA', 'aaa', 'AAA']


From answers below, there are two solutions for the mixed case, I'm not sure which is better.

L = ['ABC1', 'abc1', 'ABC2', 'abc2', 'Abc']
l = sorted(L, key=lambda x: "".join([y.lower() + y.swapcase() for y in x]))
print(l)
l = sorted(L, key=lambda x: [(c.lower(), c.isupper()) for c in x])
print(l)
like image 585
minion Avatar asked Jun 15 '18 09:06

minion


3 Answers

You can use sorted() with a custom key function:

>>> L = ['bb', 'a', 'B', 'BB', 'AAA', 'BBB', 'b', 'aa', 'aaa', 'A', 'AA', 'bbb']
>>> sorted(L, key=lambda x: (x[0].lower(), x[0].isupper(), len(x)))
['a', 'aa', 'aaa', 'A', 'AA', 'AAA', 'b', 'bb', 'bbb', 'B', 'BB', 'BBB']

This works by comparing each element's first character lowercased first, then the element's case and finally its length.

P.S. To also handle mixed-case and mixed-character elements you'd need to compare tuples for individual characters, e.g.:

>>> L = ['ab', 'aA', 'bb', 'a', 'B', 'BB', 'b', 'aa', 'A', 'AA']
>>> sorted(L, key=lambda x: [(c.lower(), c.isupper()) for c in x])
['a', 'aa', 'aA', 'ab', 'A', 'AA', 'b', 'bb', 'B', 'BB']
like image 99
Eugene Yarmash Avatar answered Oct 16 '22 06:10

Eugene Yarmash


TLDR

result = sorted(lst, key=lambda s: [(c.lower(), c.isupper()) for c in s])

You can transform each string to a list of tuples, one per character. A tuple for a character c takes a form (c.lower(), c.isupper()). The usual list comparison gives your desired sort.

lst = ["a", "aa", "aaa", "A", "AA", "AAA", "b", "bb", "bbb", "B", "BB", "BBB"]

lsts = [[(c.lower(), c.isupper()) for c in s] for s in lst]

# [[('a', False)],
# [('a', False), ('a', False)],
# [('a', False), ('a', False), ('a', False)],
# [('a', True)],
# [('a', True), ('a', True)],
# [('a', True), ('a', True), ('a', True)],
# [('b', False)],
# [('b', False), ('b', False)],
# [('b', False), ('b', False), ('b', False)],
# [('b', True)],
# [('b', True), ('b', True)],
# [('b', True), ('b', True), ('b', True)]]

res = ["".join(c.upper() if u else c for c, u in ls) for ls in lsts]

Recovering the result:

['a', 'aa', 'aaa', 'A', 'AA', 'AAA', 'b', 'bb', 'bbb', 'B', 'BB', 'BBB']

Note that there are many distinct ways to order mixed-case elements consistent with the OPs original example. This approach is the only reasonable sort that I can think of which arises from an anti-symmetric order relation. In particular, this sort admits no equivalent elements that are not equal.

For example, ['aAa', 'aaA'] and ['aaA', 'aAa'] will lead to the same output of ['aaA', 'aAa'].

like image 36
hilberts_drinking_problem Avatar answered Oct 16 '22 07:10

hilberts_drinking_problem


Short answer :

sorted(l, key=lambda x: "".join([y.lower() + y.swapcase() for y in x]))

Each word is transformed by doubling each letter, first letter is the lower version of the letter, second letter is the swaped version. Second letter is swaped in order to have lowercase sorted before uppercase.

like image 31
Olivier Cazade Avatar answered Oct 16 '22 06:10

Olivier Cazade