Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all permutations of a given word in a given text?

This is an interview question (phone screen): write a function (in Java) to find all permutations of a given word that appear in a given text. For example, for word abc and text abcxyaxbcayxycab the function should return abc, bca, cab.

I would answer this question as follows:

  • Obviously I can loop over all permutations of the given word and use a standard substring function. However it might be difficult (for me right now) to write code to generate all word permutations.

  • It is easier to loop over all text substrings of the word size, sort each substring and compare it with the "sorted" given word. I can code such a function immediately.

  • I can probably modify some substring search algorithm but I do not remember these algorithms now.

How would you answer this question?

like image 638
Michael Avatar asked May 23 '12 20:05

Michael


People also ask

How do you find the permutation of a string?

We can find the count without finding all permutation. Idea is to find all the characters that is getting repeated, i.e., frequency of all the character. Then, we divide the factorial of the length of string by multiplication of factorial of frequency of characters.


3 Answers

This is probably not the most efficient solution algorithmically, but it is clean from a class design point of view. This solution takes the approach of comparing "sorted" given words.

We can say that a word is a permutation of another if it contains the same letters in the same number. This means that you can convert the word from a String to a Map<Character,Integer>. Such conversion will have complexity O(n) where n is the length of the String, assuming that insertions in your Map implementation cost O(1).

The Map will contain as keys all the characters found in the word and as values the frequencies of the characters.

Example. abbc is converted to [a->1, b->2, c->1]

bacb is converted to [a->1, b->2, c->1]

So if you have to know if two words are one the permutation of the other, you can convert them both into maps and then invoke Map.equals.

Then you have to iterate over the text string and apply the transformation to all the substrings of the same length of the words that you are looking for.

Improvement proposed by Inerdial

This approach can be improved by updating the Map in a "rolling" fashion.

I.e. if you're matching at index i=3 in the example haystack in the OP (the substring xya), the map will be [a->1, x->1, y->1]. When advancing in the haystack, decrement the character count for haystack[i], and increment the count for haystack[i+needle.length()].

(Dropping zeroes to make sure Map.equals() works, or just implementing a custom comparison.)

Improvement proposed by Max

What if we also introduce matchedCharactersCnt variable? At the beginning of the haystack it will be 0. Every time you change your map towards the desired value - you increment the variable. Every time you change it away from the desired value - you decrement the variable. Each iteration you check if the variable is equal to the length of needle. If it is - you've found a match. It would be faster than comparing the full map every time.

Pseudocode provided by Max:

needle = "abbc"
text = "abbcbbabbcaabbca"

needleSize = needle.length()
//Map of needle character counts
targetMap = [a->1, b->2, c->1]

matchedLength = 0
curMap = [a->0, b->0, c->0]
//Initial map initialization
for (int i=0;i<needle.length();i++) {
    if (curMap.contains(haystack[i])) {
        matchedLength++
        curMap[haystack[i]]++
    }
}

if (matchedLength == needleSize) {
    System.out.println("Match found at: 0");
}

//Search itself
for (int i=0;i<haystack.length()-needle.length();i++) {
    int targetValue1 = targetMap[haystack[i]]; //Reading from hashmap, O(1)
    int curValue1 = curMap[haystack[i]]; //Another read
    //If we are removing beneficial character
    if (targetValue1 > 0 && curValue1 > 0 && curValue1 <= targetValue1) {       
        matchedLength--;
    }
    curMap[haystack[i]] = curValue1 + 1; //Write to hashmap, O(1)


    int targetValue2 = targetMap[haystack[i+needle.length()]] //Read
    int curValue2 = curMap[haystack[i+needle.length()]] //Read
    //We are adding a beneficial character
    if (targetValue2 > 0 && curValue2 < targetValue2) { //If we don't need this letter at all, the amount of matched letters decreases
        matchedLength++;
    }
    curMap[haystack[i+needle.length()]] = curValue2 + 1; //Write

    if (matchedLength == needleSize) {
        System.out.println("Match found at: "+(i+1));
    }
}

//Basically with 4 reads and 2 writes which are 
//independent of the size of the needle,
//we get to the maximal possible performance: O(n)
like image 122
Vitaly Olegovitch Avatar answered Oct 26 '22 02:10

Vitaly Olegovitch


To find a permutation of a string you can use number theory. But you will have to know the 'theory' behind this algorithm in advance before you can answer the question using this algorithm.

There is a method where you can calculate a hash of a string using prime numbers. Every permutation of the same string will give the same hash value. All other string combination which is not a permutation will give some other hash value.

The hash-value is calculated by c1 * p1 + c2 * p2 + ... + cn * pn where ci is a unique value for the current char in the string and where pi is a unique prime number value for the ci char.

Here is the implementation.

public class Main {
    static int[] primes = new int[] { 2, 3, 5, 7, 11, 13, 17, 
        19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 
        73, 79, 83, 89, 97, 101, 103 };

    public static void main(String[] args) {        
        final char[] text = "abcxaaabbbccyaxbcayaaaxycab"
            .toCharArray();     
        char[] abc = new char[]{'a','b','c'};       
        int match = val(abc);                   
        for (int i = 0; i < text.length - 2; i++) {
            char[] _123 = new char[]{text[i],text[i+1],text[i+2]};          
            if(val(_123)==match){
                System.out.println(new String(_123) );      
            }
        }
    }   
    static int p(char c) {
        return primes[(int)c - (int)'a'];
    }   
    static int val(char[] cs) {
        return 
        p(cs[0])*(int)cs[0] + p(cs[1])*(int)cs[1] + p(cs[2])*(int)cs[2];        
    }
}

The output of this is: abc bca cab

like image 34
Kunukn Avatar answered Oct 26 '22 02:10

Kunukn


You should be able to do this in a single pass. Start by building a map that contains all the characters in the word you're searching for. So initially the map contains [a, b, c].

Now, go through the text one character at a time. The loop looks something like this, in pseudo-code.

found_string = "";
for each character in text
    if character is in map
        remove character from map
        append character to found_string
        if map is empty
            output found_string
            found_string = ""
            add all characters back to map
        end if
    else
        // not a permutation of the string you're searching for
        refresh map with characters from found_string
        found_string = ""
    end if
end for

If you want unique occurrences, change the output step so that it adds the found strings to a map. That'll eliminate duplicates.

There's the issue of words that contain duplicated letters. If that's a problem, make the key the letter and the value a count. 'Removing' a character means decrementing its count in the map. If the count goes to 0, then the character is in effect removed from the map.

The algorithm as written won't find overlapping occurrences. That is, given the text abcba, it will only find abc. If you want to handle overlapping occurrences, you can modify the algorithm so that when it finds a match, it decrements the index by one minus the length of the found string.

That was a fun puzzle. Thanks.

like image 36
Jim Mischel Avatar answered Oct 26 '22 01:10

Jim Mischel