Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negating literal strings in a Java regular expression

Tags:

java

regex

So regular expressions seem to match on the longest possible match. For instance:

public static void main(String[] args) {
    String s = "ClarkRalphKentGuyGreenGardnerClarkSupermanKent";
    Pattern p = Pattern.compile("Clark.*Kent", Pattern.CASE_INSENSITIVE);
    Matcher myMatcher = p.matcher(s);
    int i = 1;
    while (myMatcher.find()) {
        System.out.println(i++ + ". " + myMatcher.group());
    }
}

generates output

  1. ClarkRalphKentGuyGreenGardnerClarkSupermanKent

I would like this output

  1. ClarkRalphKent
  2. ClarkSupermanKent

I have been trying Patterns like:

 Pattern p = Pattern.compile("Clark[^((Kent)*)]Kent", Pattern.CASE_INSENSITIVE);

that don't work, but you see what I'm trying to say. I want the string from Clark to Kent that doesn't contain any occurrences of Kent.

This string:

ClarkRalphKentGuyGreenGardnerBruceBatmanKent

should generate output

  1. ClarkRalphKent
like image 657
Nathan Spears Avatar asked Dec 07 '22 08:12

Nathan Spears


2 Answers

greedy vs reluctant is your friend here.

try: Clark.+?Kent

like image 72
Gareth Davis Avatar answered Dec 09 '22 23:12

Gareth Davis


You want a "reluctant" rather than a "greedy" quantifier. Simply putting a ? after your * should do the trick.

like image 24
Michael Borgwardt Avatar answered Dec 09 '22 22:12

Michael Borgwardt