Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate over regex expression

Tags:

java

regex

Let's say I have the following String:

name1=gil;name2=orit;

I want to find all matches of name=value and make sure that the whole string matches the pattern.

So I did the following:

  1. Ensure that the whole pattern matches what I want.

    Pattern p = Pattern.compile("^((\\w+)=(\\w+);)*$");
    Matcher m = p.matcher(line);
    if (!m.matches()) {
        return false;
    }
    
  2. Iterate over the pattern name=value

    Pattern p = Pattern.compile("(\\w+)=(\\w+);");
    Matcher m = p.matcher(line);
    while (m.find()) {
        map.put(m.group(1), m.group(2));
    }
    

Is there some way to do this with one regex?

like image 888
gilsilas Avatar asked May 29 '13 14:05

gilsilas


2 Answers

You can validate and iterate over matches with one regex by:

  • Ensuring there are no unmatched characters between matches (e.g. name1=x;;name2=y;) by putting a \G at the start of our regex, which mean "the end of the previous match".

  • Checking whether we've reached the end of the string on our last match by comparing the length of our string to Matcher.end(), which returns the offset after the last character matched.

Something like:

String line = "name1=gil;name2=orit;";
Pattern p = Pattern.compile("\\G(\\w+)=(\\w+);");
Matcher m = p.matcher(line);
int lastMatchPos = 0;
while (m.find()) {
   System.out.println(m.group(1));
   System.out.println(m.group(2));
   lastMatchPos = m.end();
}
if (lastMatchPos != line.length())
   System.out.println("Invalid string!");

Live demo.

like image 177
Bernhard Barker Avatar answered Nov 05 '22 14:11

Bernhard Barker


You have to enable multiline-mode for "^" and "$" to work as expected.

Pattern p = Pattern.compile("^(?:(\\w+)=(\\w+);)*$", Pattern.MULTILINE);
while (m.find()) {
    for (int i = 0; i < m.groupCount() - 2; i += 2) {
        map.put(m.group(i + 1), m.group(i + 2));
    }
}

Comments where right, you still have to iterate through matching groups for each line and make the outer group a non-capturing group (?:...).

like image 3
weaselflink Avatar answered Nov 05 '22 13:11

weaselflink