Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to sort string to match second string - only adjacent swaps allowed

I want to get the minimum number of letter-swaps needed to convert one string to match a second string. Only adjacent swaps are allowed.

Inputs are: length of strings, string_1, string_2

Some examples:

Length | String 1 | String 2 | Output
-------+----------+----------+-------
   3   | ABC      | BCA      |   2 
   7   | AABCDDD  | DDDBCAA  |  16
   7   | ZZZAAAA  | ZAAZAAZ  |   6

Here's my code:

def letters(number, word_1, word_2):

    result = 0

    while word_1 != word_2:
        index_of_letter = word_1.find(word_2[0])
        result += index_of_letter
        word_1 = word_1.replace(word_2[0], '', 1)
        word_2 = word_2[1:]

    return result

It gives the correct results, but the calculation should stay under 20 seconds.

Here are two sets of input data (1 000 000 characters long strings): https://ufile.io/8hp46 and https://ufile.io/athxu.

On my setup the first one is executed in around 40 seconds and the second in 4 minutes.

How to calculate the result in less than 20 seconds?

like image 277
Stoiczkow Avatar asked Jan 04 '23 03:01

Stoiczkow


1 Answers

@KennyOstrom's is 90% there. The inversion count is indeed the right angle to look at this problem.

The only bit that is missing is that we need a "relative" inversion count, meaning the number of inversions not to get to normal sort order but to the other word's order. We therefore need to compute the permutation that stably maps word1 to word2 (or the other way round), and then compute the inversion count of that. Stability is important here, because obviously there will be lots of nonunique letters.

Here is a numpy implementation that takes only a second or two for the two large examples you posted. I did not test it extensively, but it does agree with @trincot's solution on all test cases. For the two large pairs it finds 1819136406 and 480769230766.

import numpy as np

_, word1, word2 = open("lit10b.in").read().split()
word1 = np.frombuffer(word1.encode('utf8')
                      + (((1<<len(word1).bit_length()) - len(word1))*b'Z'),
                      dtype=np.uint8)
word2 = np.frombuffer(word2.encode('utf8')
                      + (((1<<len(word2).bit_length()) - len(word2))*b'Z'),
                      dtype=np.uint8)
n = len(word1)

o1 = np.argsort(word1, kind='mergesort')
o2 = np.argsort(word2, kind='mergesort')
o1inv = np.empty_like(o1)
o1inv[o1] = np.arange(n)

order = o2[o1inv]

sum_ = 0
for i in range(1, len(word1).bit_length()):
    order = np.reshape(order, (-1, 1<<i))
    oo = np.argsort(order, axis = -1, kind='mergesort')
    ioo = np.empty_like(oo)
    ioo[np.arange(order.shape[0])[:, None], oo] = np.arange(1<<i)
    order[...] = order[np.arange(order.shape[0])[:, None], oo]
    hw = 1<<(i-1)
    sum_ += ioo[:, :hw].sum() - order.shape[0] * (hw-1)*hw // 2

print(sum_)
like image 184
Paul Panzer Avatar answered Jan 05 '23 15:01

Paul Panzer