Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average of two strings in alphabetical/lexicographical order

Suppose you take the strings 'a' and 'z' and list all the strings that come between them in alphabetical order: ['a','b','c' ... 'x','y','z']. Take the midpoint of this list and you find 'm'. So this is kind of like taking an average of those two strings.

You could extend it to strings with more than one character, for example the midpoint between 'aa' and 'zz' would be found in the middle of the list ['aa', 'ab', 'ac' ... 'zx', 'zy', 'zz'].

Might there be a Python method somewhere that does this? If not, even knowing the name of the algorithm would help.

I began making my own routine that simply goes through both strings and finds midpoint of the first differing letter, which seemed to work great in that 'aa' and 'az' midpoint was 'am', but then it fails on 'cat', 'doggie' midpoint which it thinks is 'c'. I tried Googling for "binary search string midpoint" etc. but without knowing the name of what I am trying to do here I had little luck.

I added my own solution as an answer

like image 445
Bemmu Avatar asked Mar 24 '10 19:03

Bemmu


People also ask

How do you sort strings in lexicographical order?

To sort the strings we can use the inbuilt sort() function. The function sorts the string in lexicographical order by default.

How do you compare two strings lexicographically in Python?

It has the syntax: str1 <= str2 where str1 and str2 are the strings to be compared. greater than equal to(>=)-This operator returns True if the first string is lexicographically larger than or equal to the second string, otherwise it returns False.


2 Answers

If you define an alphabet of characters, you can just convert to base 10, do an average, and convert back to base-N where N is the size of the alphabet.

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def enbase(x):
    n = len(alphabet)
    if x < n:
        return alphabet[x]
    return enbase(x/n) + alphabet[x%n]

def debase(x):
    n = len(alphabet)
    result = 0
    for i, c in enumerate(reversed(x)):
        result += alphabet.index(c) * (n**i)
    return result

def average(a, b):
    a = debase(a)
    b = debase(b)
    return enbase((a + b) / 2)

print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('cat', 'doggie') #budeel
print average('google', 'microsoft') #gebmbqkil
print average('microsoft', 'google') #gebmbqkil

Edit: Based on comments and other answers, you might want to handle strings of different lengths by appending the first letter of the alphabet to the shorter word until they're the same length. This will result in the "average" falling between the two inputs in a lexicographical sort. Code changes and new outputs below.

def pad(x, n):
    p = alphabet[0] * (n - len(x)) 
    return '%s%s' % (x, p)

def average(a, b):
    n = max(len(a), len(b))
    a = debase(pad(a, n))
    b = debase(pad(b, n))
    return enbase((a + b) / 2)

print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('aa', 'az') #m (equivalent to ma)
print average('cat', 'doggie') #cumqec
print average('google', 'microsoft') #jlilzyhcw
print average('microsoft', 'google') #jlilzyhcw
like image 130
FogleBird Avatar answered Sep 22 '22 18:09

FogleBird


If you mean the alphabetically, simply use FogleBird's algorithm but reverse the parameters and the result!

>>> print average('cat'[::-1], 'doggie'[::-1])[::-1]
cumdec

or rewriting average like so

>>> def average(a, b):
...     a = debase(a[::-1])
...     b = debase(b[::-1])
...     return enbase((a + b) / 2)[::-1]
... 
>>> print average('cat', 'doggie')
cumdec
>>> print average('google', 'microsoft') 
jlvymlupj
>>> print average('microsoft', 'google') 
jlvymlupj
like image 31
John La Rooy Avatar answered Sep 23 '22 18:09

John La Rooy