I am learning about Java regexes, and I noticed the following operator:
\\*1
I'm having hard time figuring out what it means (searching in the web didn't help). For example, what is the difference between these two options:
Pattern p1 = Pattern.compile("(a)\\1*"); // option1
Pattern p2 = Pattern.compile("(a)"); // option2
Matcher m1 = p1.matcher("a");
Matcher m2 = p2.matcher("a");
System.out.println(m1.group(0));
System.out.println(m2.group(0));
Result:
a
a
Thanks!
\\1
is back reference corresponding in this case to the first capturing group which is (a)
here.
So (a)\\1*
is equivalent to (a)a*
in this particular case.
Here is an example that shows the difference:
Pattern p1 = Pattern.compile("(a)\\1*");
Pattern p2 = Pattern.compile("(a)");
Matcher m1 = p1.matcher("aa");
Matcher m2 = p2.matcher("aa");
m1.find();
System.out.println(m1.group());
m2.find();
System.out.println(m2.group());
Output:
aa
a
As you can see when you have several a
the first regular expression captures all the successive a
while the second one captures only the first one.
\\1*
looks for a
again, 0 or more times. Maybe easier to understand would be this example, using (a)\\1+
, which looks for at least 2 a
s:
Pattern p1 = Pattern.compile("(a)\\1+");
Matcher m1 = p1.matcher("aaaaabbaaabbba");
while (m1.find()) System.out.println(m1.group());
the output will be:
aaaaa
aaa
But the last a
won't match because it is not repeated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With