I am learning python and am having trouble with sorting. I feel like the key (for sorting) is too limiting and difficult to use once sorting algorithm begins getting more complicated. Here is the list I want to sort:
['A1', 'AA1', 'B3', 'B2', 'BB1', 'AZ15']
where each value is like an excel column (ie. 'BB1' > 'AZ15' > 'AA1' > 'B3' > 'B2' > 'A1'
).
Here is the solution I came up with after reading the following guide.
def cmp_cell_ids(name1, name2):
def split(name):
letter = ''
number = ''
for ch in name:
if ch in '1234567890':
number += ch
else:
letter += ch
return letter, int(number)
ltr1, num1 = split(name1)
ltr2, num2 = split(name2)
if len(ltr1) == len(ltr2):
if ltr1 == ltr2:
return num1 > num2
else:
return ltr1 > ltr2
return len(ltr1) > len(ltr2)
def cmp_to_key(mycmp):
class K:
def __init__(self, obj, *args):
self.obj = obj
def __lt__(self, other):
return not mycmp(self.obj, other.obj)
def __gt__(self, other):
return mycmp(self.obj, other.obj)
def __eq__(self, other):
return self.obj == other.obj
def __le__(self, other):
if self.__eq__(other):
return True
return self.__lt__(other)
def __ge__(self, other):
if self.__eq__(other):
return True
return self.__gt__(other)
def __ne__(self, other):
return self.obj != other.obj
return K
key_cell_ids_cmp = cmp_to_key(cmp_cell_ids)
cell_ids = ['A1','AA1','B3','B2','BB1','AZ15']
cell_ids.sort(key=key_cell_ids_cmp)
print(cell_ids)
Which gives the desired output
['A1', 'B2', 'B3', 'AA1', 'AZ15', 'BB1']
I am wondering if there is any easier/more pythonic implementation to this (in particular, I would love if I could get rid of that wrapper class).
First of all, writing (or copy-pasting) a cmp_to_key
function is unnecessary. Just use the one in itertools
.
In this case, though, it would be a lot more natural to use a key! Just split each element into a tuple of row name length (so B
is before AA
), a string row, and an integer column, and rely on the natural lexicographic ordering of tuples.
Viz:
import re
def cell_key(cell):
m = re.match("([A-Z]+)(\\d+)", cell)
return (len(m.group(1)), m.group(1), int(m.group(2)))
cells = ['A1', 'AA1', 'B3', 'B2', 'BB1', 'AZ15']
print(sorted(cells, key=cell_key))
Very similar solution to @Sneftel's, but I approached the problem by finding the index of the first numeric character.
import re
A = ['A1', 'AA1', 'B3', 'B2', 'BB1', 'AZ15']
def sorter(x):
n = re.search('\d', x).start()
return (len(x[:n]), x[:n], int(x[n:]))
res = sorted(A, key=sorter)
print(res)
['A1', 'B2', 'B3', 'AA1', 'AZ15', 'BB1']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With