Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm - find all permutations of string a in string b

Say we have string a = "abc" string b = "abcdcabaabccbaa"

Find location of all permutations of a in b. I am trying to find an effective algorithm for this.

Pseudo code:

sort string a // O(a loga)

for windows of length a in b  // O(b)?
   sort that window of b      // O(~a loga)?
   compare to a
   if equal
      save the index

So would this be a correct algorithm? Run time would be around O(aloga + ba loga) ~= O(a loga b)? How efficient would this be? Possibly way to reduce to O(a*b) or better?

like image 933
Bao Thai Avatar asked Jan 06 '17 22:01

Bao Thai


2 Answers

sorting is very expensive, and doesn't use the fact you move along b with a sliding window.

I would use a comparison method that is location agnostic (since any permutation is valid) - assign each letter a prime number, and each string will be the multiplication of its letter values.

this way, as you go over b, each step requires just dividing by the letter you remove from he left, and multiplying with the next letter.

You also need to convince yourself that this indeed matches uniquely for each string and covers all permutations - this comes from the uniqueness of prime decomposition. Also note that on larger strings the numbers get big so you may need some library for large numbers

like image 166
Leeor Avatar answered Oct 26 '22 09:10

Leeor


There is no need to hash, you can just count frequencies on your sliding window, and check if it matches. Assuming the size of your alphabet is s, you get a very simple O(s(n + m)) algorithm.

// a = [1 .. m] and b = [1 .. n] are the input
cnta = [1 .. s] array initialized to 0
cntb = [1 .. s] array initialized to 0
// nb_matches = the number of i s.t. cnta[i] = cntb[i]
// thus the current subword = a iff. nb_matches = s
nb_matches = s

for i = 1 to m:
    if cntb[a[i]] = 0: nb_matches -= 1
    cntb[a[i]] += 1

ans = 0
for i = 1 to n:
    if cntb[b[i]] = cnta[b[i]]: nb_matches -= 1
    cntb[b[i]] += 1
    if nb_matches = s: ans += 1
    if cntb[b[i]] = cnta[b[i]]: nb_matches += 1
    if i - m + 1 >= 1:
        if cntb[b[i - m + 1]] = cnta[b[i - m + 1]]: nb_matches -= 1
        cntb[b[i - m + 1]] += 1
        if cntb[b[i - m + 1]] = cnta[b[i - m + 1]]: nb_matches += 1
        cntb[b[i - m + 1]] -= 1
return ans
like image 26
md5 Avatar answered Oct 26 '22 07:10

md5