Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find ASCII "arrows" in text

Tags:

java

regex

I'm trying to find all the occurrences of "Arrows" in text, so in

"<----=====><==->>"

the arrows are:

"<----", "=====>", "<==", "->", ">"

This works:

 String[] patterns = {"<=*", "<-*", "=*>", "-*>"};
    for (String p : patterns) {
      Matcher A = Pattern.compile(p).matcher(s);
       while (A.find()) {
        System.out.println(A.group());
      }         
    }

but this doesn't:

      String p = "<=*|<-*|=*>|-*>";
      Matcher A = Pattern.compile(p).matcher(s);
       while (A.find()) {
        System.out.println(A.group());
      }         

No idea why. It often reports "<" instead of "<====" or similar.

What is wrong?

like image 778
ulver Avatar asked Dec 14 '22 03:12

ulver


2 Answers

Solution

The following program compiles to one possible solution to the question:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class A {
  public static void main( String args[] ) {
    String p = "<=+|<-+|=+>|-+>|<|>";
    Matcher m = Pattern.compile(p).matcher(args[0]);
    while (m.find()) {
      System.out.println(m.group());
    }
  }
}

Run #1:

$ java A "<----=====><<---<==->>==>"
<----
=====>
<
<---
<==
->
>
==>

Run #2:

$ java A "<----=====><=><---<==->>==>"
<----
=====>
<=
>
<---
<==
->
>
==>

Explanation

An asterisk will match zero or more of the preceding characters. A plus (+) will match one or more of the preceding characters. Thus <-* matches < whereas <-+ matches <- and any extended version (such as <--------).

like image 137
Dave Jarvis Avatar answered Dec 15 '22 16:12

Dave Jarvis


When you match "<=*|<-*|=*>|-*>" against the string "<---", it matches the first part of the pattern, "<=*", because * includes zero or more. Java matching is greedy, but it isn't smart enough to know that there is another possible longer match, it just found the first item that matches.

like image 22
Kevin Peterson Avatar answered Dec 15 '22 17:12

Kevin Peterson