Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behaviour in Java regex

Tags:

java

regex

Following code does not find the string "MOVE" present in myStr variable

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
       String myStr = "    ELSE  MOVE   EXT-LNGSHRT-AMT-C TO WK-UNSIGNED-LNGSHRT-AMT  COMPUTE WK-SHORT-AMT = EXT-LNGSHRT-AMT-C * -1.";
       String verbsRegex = "\\s+(ACCEPT|ADD|ALTER|CALL|CANCEL|CLOSE|COMPUTE|DELETE|DISPLAY|DIVIDE|ELSE|EXIT|EVALUATE|EXEC|GO|GOBACK|IF|INITIALIZE|INSPECT|INVOKE|MERGE|MOVE|MULTIPLY|OPEN|PERFORM|READ|RELEASE|RETURN|REWRITE|SEARCH|SET|SORT|START|STOP|STRING|SUBTRACT|UNSTRING|WRITE|COPY|CONTINUE|WHEN)\\s+";

       Pattern p = Pattern.compile(verbsRegex);
       Matcher m = p.matcher(myStr);
       System.out.println("------------------------------------");
       while (m.find()) {
           System.out.println(myStr.substring(m.start(),m.end()));
           System.out.println("("+ m.group(1) + ")");
       }
       System.out.println("------------------------------------");
    }
}

If I change myStr to something like

       String myStr = "   MOVE  ELSE  MOVE   EXT-LNGSHRT-AMT-C TO WK-UNSIGNED-LNGSHRT-AMT  COMPUTE WK-SHORT-AMT = EXT-LNGSHRT-AMT-C * -1.";

java starts returning me the MOVE. But in this case, ELSE get missed out!

Any explanation for this behavior please? Am I missing something obvious here?

Thanks in advance.

like image 640
Chaitanya R Avatar asked Jun 06 '26 15:06

Chaitanya R


2 Answers

The \s+ at the end clashes with \s+ at the beginning of the pattern. They are greedy, which means it matches up to the word MOVE, leaving no white-space to the left of it, which means it doesn't match.

Change both \s+ to \s+? and MOVE matches. But be aware that it means you're requiring all captured groups to have their own 1-or-more white-space characters. A word boundary or lookaround can solve this.

like image 194
linden2015 Avatar answered Jun 09 '26 05:06

linden2015


Instead of using \s+ you can use \b Word Boundaries to match any word in the group, so your regex should look like this :

\\b(ACCEPT|...|WHEN)\\b

Outputs

------------------------------------
ELSE
(ELSE)
MOVE
(MOVE)
COMPUTE
(COMPUTE)
------------------------------------
like image 29
YCF_L Avatar answered Jun 09 '26 03:06

YCF_L



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!