Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating through String with .find() in Java regex

Tags:

java

string

regex

I'm currently trying to solve a problem from codingbat.com with regular expressions.

I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.

Here is the prompt: Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.

wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"

etc

My code thus far:

String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);

String newStr = "";
while(m.find())
    newStr += m.group().replace(word, "");

return newStr;

The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.

For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"

I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.

like image 370
Rishi Avatar asked Nov 12 '22 18:11

Rishi


1 Answers

This is a one-liner solution:

String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");

This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.

Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.

Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.

Here's a test of the usual case and the edge case, showing it works:

public static String wordEnds(String input, String word) {
    word = Pattern.quote(word); // add this line to be 100% safe
    return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}

public static void main(String[] args) {
    System.out.println(wordEnds("abcXY123XYijk", "XY"));
    System.out.println(wordEnds("abc1xyz1i1j", "1"));
}

Output:

c13i
cxziij
like image 110
Bohemian Avatar answered Nov 15 '22 11:11

Bohemian