Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplify sort of Excel cell names in python

I am learning python and am having trouble with sorting. I feel like the key (for sorting) is too limiting and difficult to use once sorting algorithm begins getting more complicated. Here is the list I want to sort:

['A1', 'AA1', 'B3', 'B2', 'BB1', 'AZ15']

where each value is like an excel column (ie. 'BB1' > 'AZ15' > 'AA1' > 'B3' > 'B2' > 'A1').

Here is the solution I came up with after reading the following guide.

def cmp_cell_ids(name1, name2):
    def split(name):
        letter = ''
        number = ''
        for ch in name:
            if ch in '1234567890':
                number += ch
            else:
                letter += ch
        return letter, int(number)
    ltr1, num1 = split(name1)
    ltr2, num2 = split(name2)
    if len(ltr1) == len(ltr2):
        if ltr1 == ltr2:
            return num1 > num2
        else:
            return ltr1 > ltr2
    return len(ltr1) > len(ltr2)

def cmp_to_key(mycmp):
    class K:
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return not mycmp(self.obj, other.obj)
        def __gt__(self, other):
            return mycmp(self.obj, other.obj)
        def __eq__(self, other):
            return self.obj == other.obj
        def __le__(self, other):
            if self.__eq__(other):
                return True
            return self.__lt__(other)
        def __ge__(self, other):
            if self.__eq__(other):
                return True
            return self.__gt__(other)
        def __ne__(self, other):
            return self.obj != other.obj
    return K

key_cell_ids_cmp = cmp_to_key(cmp_cell_ids)
cell_ids = ['A1','AA1','B3','B2','BB1','AZ15']
cell_ids.sort(key=key_cell_ids_cmp)
print(cell_ids)

Which gives the desired output

['A1', 'B2', 'B3', 'AA1', 'AZ15', 'BB1']

I am wondering if there is any easier/more pythonic implementation to this (in particular, I would love if I could get rid of that wrapper class).

like image 514
Yaroslav Avatar asked Mar 06 '23 04:03

Yaroslav


2 Answers

First of all, writing (or copy-pasting) a cmp_to_key function is unnecessary. Just use the one in itertools.

In this case, though, it would be a lot more natural to use a key! Just split each element into a tuple of row name length (so B is before AA), a string row, and an integer column, and rely on the natural lexicographic ordering of tuples.

Viz:

import re

def cell_key(cell):
    m = re.match("([A-Z]+)(\\d+)", cell)
    return (len(m.group(1)), m.group(1), int(m.group(2)))

cells = ['A1', 'AA1', 'B3', 'B2', 'BB1', 'AZ15']

print(sorted(cells, key=cell_key))
like image 172
Sneftel Avatar answered Mar 08 '23 19:03

Sneftel


Very similar solution to @Sneftel's, but I approached the problem by finding the index of the first numeric character.

import re

A = ['A1', 'AA1', 'B3', 'B2', 'BB1', 'AZ15']

def sorter(x):
    n = re.search('\d', x).start()
    return (len(x[:n]), x[:n], int(x[n:]))

res = sorted(A, key=sorter)

print(res)

['A1', 'B2', 'B3', 'AA1', 'AZ15', 'BB1']
like image 40
jpp Avatar answered Mar 08 '23 19:03

jpp