Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a string is a substring of another string

Tags:

java

substring

I read article with nice exercise about checking is string is a substring of another.

The content of the exercise is:

Write a program that takes 2 string parameters from the command line. Program must verify if the second string is a substring of the first string (you cannot use substr, substring or any other standard function including regular expression libraries).

Each occurrence of * in the second substring means that it can be a match for zero or more characters of the first string.

Consider the example: Input string 1: abcd Input string 2: a*c Program should evaluate that the string 2 is a substring of the string 1.

Additionally asterisk (*) may be considered as a regular character, if it is preceded by a backslash (\). Backslash (\) is considered as a regular character in all cases other than preceding the asterisk (*).

I wrote simple app, which firstly check that second string is NOT longer than first(but there is one problem, when test at ("ab", "a*b") this is correct test, but the method fails):

public static boolean checkCharactersCount(String firstString, String secondString) {
    return (firstString.length() > 0 && secondString.length() > 0) &&
            (firstString.length() > secondString.length());

... and next verify is a subtring:

public static boolean checkSubstring(String firstString, String secondString) {
    int correctCharCounter = 0;
    int lastCorrectCharAtIndex = -1;

    for (int i = 0; i < secondString.length(); i++) {
        for (int j = 0; j < firstString.length(); j++) {
            if (j > lastCorrectCharAtIndex) {

                if ((secondString.charAt(i) == firstString.charAt(j)) || secondString.charAt(i) == '*') {
                    correctCharCounter++;
                    lastCorrectCharAtIndex = j;
                }

                if (correctCharCounter >= secondString.length())
                    return true;
            }
        }
    }

    return false;
}

But there are two problems:

  1. My code as above does not support character continuity(for example test: checkSubstring("abacd", "bcd") return true, but it is wrong! - should return false)
  2. Any idea how to verify special symbol as "\*" ? Sample to test (checkSubstring("abc", "\b")

How is Your vision of solution? :)

like image 478
ACz Avatar asked Dec 10 '18 12:12

ACz


3 Answers

Try this: (Comments added for explanation)

// only for non empty Strings
public boolean isSubString(String string1,String string2)
{
    // step 1: split by *, but not by \*
    List<String>list1 = new ArrayList<String>();
    char[]cs = string2.toCharArray();
    int lastIndex = 0 ;
    char lastChar = 0 ;
    int i = 0 ;
    for(; i < cs.length ; ++i)
    {
        if(cs[i]=='*' && lastChar!='\\')
        {
            list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
            //earlier buggy line:
            //list1.add(new String(cs,lastIndex,i-lastIndex));
            lastIndex = i + 1 ;
        }
        lastChar = cs[i];
    }
    if(lastIndex < i )
    {
        list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
    }
    // step 2: check indices of each string in the list
    // Note: all indices should be in proper order.
    lastIndex = 0;
    for(String str : list1)
    {
        int newIndex = string1.indexOf(str,lastIndex);
        if(newIndex < 0)
        {
            return false;
        }
        lastIndex = newIndex+str.length();
    }
    return true;
}

in case you are not allowed to use String.indexOf() then write a function public int indexOf(String string1,String string2, int index2) and replace this statement

int newIndex = string1.indexOf(str,lastInxdex);

with this statement:

int newIndex = indexOf(string1, str,lastInxdex);

========================================================

Appendix A: The code i tested:

package jdk.conf;

import java.util.ArrayList;
import java.util.List;

public class Test01 {
    public static void main(String[] args)
    {
        Test01 test01 = new Test01();
        System.out.println(test01.isSubString("abcd", "a*c"));
        System.out.println(test01.isSubString("abcd", "bcd"));
        System.out.println(test01.isSubString("abcd", "*b"));
        System.out.println(test01.isSubString("abcd", "ac"));
        System.out.println(test01.isSubString("abcd", "bd"));
        System.out.println(test01.isSubString("abcd", "b*d"));
        System.out.println(test01.isSubString("abcd", "b\\*d"));
        System.out.println(test01.isSubString("abcd", "\\*d"));
        System.out.println(test01.isSubString("abcd", "b\\*"));

        System.out.println(test01.isSubString("a*cd", "\\*b"));
        System.out.println(test01.isSubString("", "b\\*"));
        System.out.println(test01.isSubString("abcd", ""));

        System.out.println(test01.isSubString("a*bd", "\\*b"));
    }
    // only for non empty Strings
    public boolean isSubString(String string1,String string2)
    {
        // step 1: split by *, but not by \*
        List<String>list1 = new ArrayList<String>();
        char[]cs = string2.toCharArray();
        int lastIndex = 0 ;
        char lastChar = 0 ;
        int i = 0 ;
        for(; i < cs.length ; ++i)
        {
            if(cs[i]=='*' && lastChar!='\\')
            {
                list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
                lastIndex = i + 1 ;
            }
            lastChar = cs[i];
        }
        if(lastIndex < i )
        {
            list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*"));
        }
        // step 2: check indices of each string in the list
        // Note: all indices should be in proper order.
        lastIndex = 0;
        for(String str : list1)
        {
            int newIndex = string1.indexOf(str,lastIndex);
            if(newIndex < 0)
            {
                return false;
            }
            lastIndex = newIndex+str.length();
        }
        return true;
    }
}

Output:

true
true
true
false
false
true
false
false
false
false
false
true
true
like image 94
Abhishek Oza Avatar answered Oct 05 '22 13:10

Abhishek Oza


I'd do this in a couple of stages.

Let's call the potential substring p and the string which we're testing for containing the substring s.

Break down the "contains" portion as a series of questions of "does p match starting at the Nth position of s?"; obviously you go through s from the first position onward to see if p matches at any position of s.

In matching, we have the possibility of running into a "*"; in that case, we want to know if the portion of p after the * is a substring of the portion of s after matching the portion of p up until the *. That suggests a recursive routine, taking the portion to be matched and the string to match and returning true/false. When you encounter a *, form the two new strings and call yourself.

If instead you encounter a \, then you just continue the regular matching with the next character instead of making the recursive call. Given that you need to do that, I suppose it might be easiest if you build pPrime from the original p, so that you can remove the backslash(es) when they're encountered, analagous to removing the asterisk(s) from the wildcard matching.

I haven't actually written any code on this, you only asked for approach.

like image 23
arcy Avatar answered Oct 05 '22 11:10

arcy


I found this a nice challenge to do. This exercise really forces us to think on a very low level of the language and algorithms in general. No lambdas, no streams, no regex, no find, no substring, nothing. Just the old CharAt, some fors and what not. Essentially I made a lookup method which looks up for the first character of the string to be found and then another lookup which takes into consideration your rules from that point onwards. If it fails, it goes back to the first index found, adds one and it does how many iterations necessary until the end of the string. If no match is found it's supposed to return false. If only one is found, then that is enough to consider it a substring. The most important corner cases are considered at the beginning of the calculus so that if false is detected as certain that it doesn't go further. Thus '*' alone means any character match and we can escape it with \. I tried to include most corner cases and this was really a challenge. I'm not entirely sure if my code covers all cases for you but it should cover quite a few. I really wanted to help you so this is my approach and here is my code:

package com.jesperancinha.string;

public class StringExercise {

    private static final char ASTERISK = '*';
    private static final char BACKSLASH = '\\';

    public boolean checkIsSubString(String mainString, String checkString) {
        int nextIndex = getNextIndex(0, checkString.charAt(0), mainString);
        if (nextIndex == -1) {
            return false;
        }
        boolean result = checkFromIndex(nextIndex, mainString, checkString);
        while (nextIndex < mainString.length() - 1 && nextIndex > -1) {
            if (!result) {
                nextIndex = getNextIndex(nextIndex + 1, checkString.charAt(0), mainString);
                if (nextIndex > -1) {
                    result = checkFromIndex(nextIndex, mainString, checkString);
                }
            } else {
                return result;
            }
        }
        return result;
    }

    private int getNextIndex(int start, char charAt, String mainString) {
        if (charAt == ASTERISK || charAt == BACKSLASH) {
            return start;
        }
        for (int i = start; i < mainString.length(); i++) {
            if (mainString.charAt(i) == charAt) {
                return i;
            }
        }
        return -1;
    }

    private boolean checkFromIndex(int nextIndex, String mainString, String checkString) {
        for (int i = 0, j = 0; i < checkString.length(); i++, j++) {
            if (i < (checkString.length() - 2) && checkString.charAt(i) == BACKSLASH
                    && checkString.charAt(i + 1) == ASTERISK) {
                i++;
                if (mainString.charAt(j + nextIndex) == BACKSLASH) {
                    j++;
                }
                if (checkString.charAt(i) != mainString.charAt(j + nextIndex)) {
                    return false;
                }
            }
            if (i > 0 && checkString.charAt(i - 1) != BACKSLASH
                    && checkString.charAt(i) == ASTERISK) {
                if (i < checkString.length() - 1 && (j + nextIndex) < (mainString.length() - 1)
                        && checkString.charAt(i + 1) !=
                        mainString.charAt(j + nextIndex + 1)) {
                    i--;
                } else {
                    if (j + nextIndex == mainString.length() - 1
                            && checkString.charAt(checkString.length() - 1) != ASTERISK
                            && checkString.charAt(checkString.length() - 2) != BACKSLASH) {
                        return false;
                    }
                }
            } else {
                if ((j + nextIndex) < (mainString.length() - 2) &&
                        mainString.charAt(j + nextIndex)
                                != checkString.charAt(i)) {
                    return false;
                }
            }
        }
        return true;
    }

}

I have made a set of unit tests but if I would put the whole class here it would be too long and the only thing I want to show you is the test cases I implemented in the unit tests. Here is the condensed version of my unit tests for this case:

package com.jesperancinha.string;

import static org.assertj.core.api.Assertions.assertThat;

import org.junit.jupiter.api.Test;

class StringExerciseMegaTest {

    @Test
    void checkIsSubString() {
        StringExercise stringExercise = new StringExercise();
        boolean test = stringExercise.checkIsSubString("abcd", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "a\\*c");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("a*c", "a\\*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa*c", "a\\*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa*csdfdsfdsfdsf", "a\\*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa**csdfdsfdsfdsf", "a\\*c");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("aasdsadasa**csdfdsfdsfdsf", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdsadasa*csdfdsfdsfdsf", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "a*c");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "dwer");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "75tyhfgh");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\iauoisdf9977675tyhfgh", "riou\\iauois");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9977675tyhfgh", "riou\\\\*iauois");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\\\*977675tyhfgh");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\*977675tyhfgh");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("\\*aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\*aasdwer");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("*aasdweriou\\*iauoisdf9\\*977675tyhfgh", "*aasdwer");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "bc");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "zbc");
        assertThat(test).isFalse();
        test = stringExercise.checkIsSubString("abcd", "*bc*");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("*bcd", "\\*bc*");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "a*c");
        assertThat(test).isTrue();
        test = stringExercise.checkIsSubString("abcd", "az*bc");
        assertThat(test).isFalse();
    }
}
like image 40
Joao Esperancinha Avatar answered Oct 05 '22 12:10

Joao Esperancinha