Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: How do I determine why a regular expression pattern match fails?

Tags:

java

regex

I am using a regular expression to match whether or not a pattern matches, but I also want to know when it fails.

For example, say I have a pattern of "N{1,3}Y". I match it against string "NNNNY". I would like to know that it failed because there were too many Ns. Or if I match it against string "XNNY", I would like to know that it failed because an invalid character "X" was in the string.

From looking at the Java regular expression package API (java.util.regex), additional information only seems to be available from the Matcher class when the match succeeds.

Is there a way to resolve this issue? Or is regular expression even an option in this scenario?

like image 513
Jin Kim Avatar asked Apr 18 '11 19:04

Jin Kim


2 Answers

I guess you should use a parser, rather than simple regular expressions.

Regular Expressions are good providing matches for string, but not quite so in providing NON-matches, let alone explaining why a match failed.

like image 172
Paulo Santos Avatar answered Nov 15 '22 12:11

Paulo Santos


It may work but I don't know if this is how you need it.

When you use matches, it fails if the whole sequence doesn't match, but you can still use find to see if the rest of the sequence contained the pattern and thus understand why it failed:

import java.util.regex.*;
import static java.lang.System.out;
class F { 
    public static void main( String ... args ) { 
        String input = args[0];
        String re = "N{1,3}Y";
        Pattern p = Pattern.compile(re);
        Matcher m = p.matcher(input);
        out.printf("Evaluating: %s on %s%nMatched: %s%n", re, input, m.matches() );
        for( int i = 0 ; i < input.length() ; i++ ) { 
           out.println();
           boolean found = m.find(i);
           if( !found ) { 
               continue;
           }
           int s = m.start();
           int e = m.end();
           i = s;
           out.printf("m.start[%s]%n"
                     +"m.end[%s]%n"
                     +"%s[%s]%s%n",s,e,
                     input.substring(0,s), 
                     input.substring(s,e), 
                     input.substring(e) );
        }

    }
}

Output:

C:\Users\oreyes\java\re>java F NNNNY
Evaluating: N{1,3}Y on NNNNY
Matched: false

m.start[1]
m.end[5]
N[NNNY]

m.start[2]
m.end[5]
NN[NNY]

m.start[3]
m.end[5]
NNN[NY]


C:\Users\oreyes\java\re>java F XNNY
Evaluating: N{1,3}Y on XNNY
Matched: false

m.start[1]
m.end[4]
X[NNY]

m.start[2]
m.end[4]
XN[NY]

In the first output: N[NNNY] you can tell there where too many N's, in the second: X[NNY] there was an X present.

Here's other output

C:\Users\oreyes\java\re>java F NYXNNXNNNNYX
Evaluating: N{1,3}Y on NYXNNXNNNNYX
Matched: false

m.start[0]
m.end[2]
[NY]XNNXNNNNYX

m.start[7]
m.end[11]
NYXNNXN[NNNY]X

m.start[8]
m.end[11]
NYXNNXNN[NNY]X

m.start[9]
m.end[11]
NYXNNXNNN[NY]X

The pattern is there but the whole expression didn't match.

It's a bit hard to understand how find, matches and lookingAt works from the doc ( at least this happened to me ) but I hope this example help you figure it out.

matches is like /^YOURPATTERNHERE$/

lookingAt is like /^YOURPATTERNHERE/

find is like /YOURPATTERNHERE/

I hope this helps.

like image 23
OscarRyz Avatar answered Nov 15 '22 13:11

OscarRyz