Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching the occurrence and pattern of characters of String2 in String1

I was asked this question in a phone interview for summer internship, and tried to come up with a n*m complexity solution (although it wasn't accurate too) in Java.

I have a function that takes 2 strings, suppose "common" and "cmn". It should return True based on the fact that 'c', 'm', 'n' are occurring in the same order in "common". But if the arguments were "common" and "omn", it would return False because even though they are occurring in the same order, but 'm' is also appearing after 'o' (which fails the pattern match condition)

I have worked over it using Hashmaps, and Ascii arrays, but didn't get a convincing solution yet! From what I have read till now, can it be related to Boyer-Moore, or Levenshtein Distance algorithms?

Hoping for respite at stackoverflow! :)

Edit: Some of the answers talk about reducing the word length, or creating a hashset. But per my understanding, this question cannot be done with hashsets because occurrence/repetition of each character in first string has its own significance. PASS conditions- "con", "cmn", "cm", "cn", "mn", "on", "co". FAIL conditions that may seem otherwise- "com", "omn", "mon", "om". These are FALSE/FAIL because "o" is occurring before as well as after "m". Another example- "google", "ole" would PASS, but "google", "gol" would fail because "o" is also appearing before "g"!

like image 390
MadTest Avatar asked May 03 '11 02:05

MadTest


People also ask

What are pattern matching characters?

SQL pattern matching enables you to use _ to match any single character and % to match an arbitrary number of characters (including zero characters). In MySQL, SQL patterns are case-insensitive by default.

What is pattern matching example?

Pattern matching is used to determine whether source files of high-level languages are syntactically correct. It is also used to find and replace a matching pattern in a text or code with another text/code. Any application that supports search functionality uses pattern matching in one way or another.

How do you get the matching characters in a string?

In order to find the count of matching characters in two Java strings the approach is to first create character arrays of both the strings which make comparison simple. After this put each unique character into a Hash map.


3 Answers

I think it's quite simple. Run through the pattern and fore every character get the index of it's last occurence in the string. The index must always increase, otherwise return false. So in pseudocode:

index = -1
foreach c in pattern
    checkindex = string.lastIndexOf(c)
    if checkindex == -1                   //not found
        return false
    if checkindex < index
        return false
    if string.firstIndexOf(c) < index     //characters in the wrong order
        return false
    index = checkindex
return true

Edit: you could further improve the code by passing index as the starting index to the lastIndexOf method. Then you would't have to compare checkindex with index and the algorithm would be faster.

Updated: Fixed a bug in the algorithm. Additional condition added to consider the order of the letters in the pattern.

like image 78
raymi Avatar answered Oct 29 '22 04:10

raymi


An excellent question and couple of hours of research and I think I have found the solution. First of all let me try explaining the question in a different approach.

Requirement:

Lets consider the same example 'common' (mainString) and 'cmn'(subString). First we need to be clear that any characters can repeat within the mainString and also the subString and since its pattern that we are concentrating on, the index of the character play a great role to. So we need to know:

  • Index of the character (least and highest)

Lets keep this on hold and go ahead and check the patterns a bit more. For the word common, we need to find whether the particular pattern cmn is present or not. The different patters possible with common are :- (Precedence apply )

  • c -> o
  • c -> m
  • c -> n
  • o -> m
  • o -> o
  • o -> n
  • m -> m
  • m -> o
  • m -> n
  • o -> n

At any moment of time this precedence and comparison must be valid. Since the precedence plays a huge role, we need to have the index of each unique character Instead of storing the different patterns.

Solution

First part of the solution is to create a Hash Table with the following criteria :-

  1. Create a Hash Table with the key as each character of the mainString
  2. Each entry for a unique key in the Hash Table will store two indices i.e lowerIndex and higherIndex
  3. Loop through the mainString and for every new character, update a new entry of lowerIndex into the Hash with the current index of the character in mainString.
  4. If Collision occurs, update the current index with higherIndex entry, do this until the end of String

Second and main part of pattern matching :-

  1. Set Flag as False
  2. Loop through the subString and for every character as the key, retreive the details from the Hash.
  3. Do the same for the very next character.
  4. Just before loop increment, verify two conditions

    If highestIndex(current character) > highestIndex(next character) Then
       Pattern Fails, Flag <- False, Terminate Loop
       // This condition is applicable for almost all the cases for pattern matching
    
    Else If lowestIndex(current character) > lowestIndex(next character) Then
       Pattern Fails, Flag <- False, Terminate Loop
       // This case is explicitly for cases in which patterns like 'mon' appear
  5. Display the Flag

N.B : Since I am not so versatile in Java, I did not submit the code. But some one can try implementing my idea

like image 45
NirmalGeo Avatar answered Oct 29 '22 03:10

NirmalGeo


I had myself done this question in an inefficient manner, but it does give accurate result! I would appreciate if anyone can make out an an efficient code/algorithm from this!

Create a function "Check" which takes 2 strings as arguments. Check each character of string 2 in string 1. The order of appearance of each character of s2 should be verified as true in S1.

  1. Take character 0 from string p and traverse through the string s to find its index of first occurrence.
  2. Traverse through the filled ascii array to find any value more than the index of first occurrence.
  3. Traverse further to find the last occurrence, and update the ascii array
  4. Take character 1 from string p and traverse through the string s to find the index of first occurence in string s
  5. Traverse through the filled ascii array to find any value more than the index of first occurrence. if found, return False.
  6. Traverse further to find the last occurrence, and update the ascii array

As can be observed, this is a bruteforce method...I guess O(N^3)

public class Interview
{
    public static void main(String[] args)
{
    if (check("google", "oge"))
        System.out.println("yes");
    else System.out.println("sorry!");
}

 public static boolean check (String s, String p) 
{   

     int[] asciiArr =  new int[256];    
     for(int pIndex=0; pIndex<p.length(); pIndex++) //Loop1 inside p
     {
        for(int sIndex=0; sIndex<s.length(); sIndex++) //Loop2 inside s
        {
            if(p.charAt(pIndex) == s.charAt(sIndex))    
            {
                asciiArr[s.charAt(sIndex)] = sIndex; //adding char from s to its Ascii value

                for(int ascIndex=0; ascIndex<256; )     //Loop 3 for Ascii Array
                {
                    if(asciiArr[ascIndex]>sIndex)           //condition to check repetition
                    return false;
                    else ascIndex++;
                }
            }
        }
     }
    return true;
}
}
like image 31
MadTest Avatar answered Oct 29 '22 03:10

MadTest