Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine where a regex failed to match using Java APIs

Tags:

I have tests where I validate the output with a regex. When it fails it reports that output X did not match regex Y.

I would like to add some indication of where in the string the match failed. E.g. what is the farthest the matcher got in the string before backtracking. Matcher.hitEnd() is one case of what I'm looking for, but I want something more general.

Is this possible to do?

like image 624
TimK Avatar asked Apr 14 '11 16:04

TimK


People also ask

What does \\ mean in Java regex?

The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.

How do you check if a string matches a regex in Java?

Variant 1: String matches() This method tells whether or not this string matches the given regular expression. An invocation of this method of the form str. matches(regex) yields exactly the same result as the expression Pattern. matches(regex, str).

Is Java regex matcher thread safe?

You obtain a Matcher object by invoking the matcher() method on a Pattern object. The Instances of this class are not safe for use by multiple concurrent threads.


2 Answers

If a match fails, then Match.hitEnd() tells you whether a longer string could have matched. In addition, you can specify a region in the input sequence that will be searched to find a match. So if you have a string that cannot be matched, you can test its prefixes to see where the match fails:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LastMatch {
    private static int indexOfLastMatch(Pattern pattern, String input) {
        Matcher matcher = pattern.matcher(input);
        for (int i = input.length(); i > 0; --i) {
            Matcher region = matcher.region(0, i);
            if (region.matches() || region.hitEnd()) {
                return i;
            }
        }

        return 0;
    }

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("[A-Z]+[0-9]+[a-z]+");
        String[] samples = {
                "*ABC",
                "A1b*",
                "AB12uv",
                "AB12uv*",
                "ABCDabc",
                "ABC123X"
        };

        for (String sample : samples) {
            int lastMatch = indexOfLastMatch(pattern, sample);
            System.out.println(sample + ": last match at " + lastMatch);
        }
    }
}

The output of this class is:

*ABC: last match at 0
A1b*: last match at 3
AB12uv: last match at 6
AB12uv*: last match at 6
ABCDabc: last match at 4
ABC123X: last match at 6
like image 128
Andreas Mayer Avatar answered Oct 31 '22 20:10

Andreas Mayer


You can take the string, and iterate over it, removing one more char from its end at every iteration, and then check for hitEnd():

int farthestPoint(Pattern pattern, String input) {
    for (int i = input.length() - 1; i > 0; i--) {
        Matcher matcher = pattern.matcher(input.substring(0, i));
        if (!matcher.matches() && matcher.hitEnd()) {
            return i;
        }
    }
    return 0;
}
like image 33
Uri Agassi Avatar answered Oct 31 '22 18:10

Uri Agassi