Is there a way to reuse a consumed character of the source in pattern matching?
For example, suppose I want to find a pattern with regex expression (a+b+|b+a+)
i.e. more than one a followed by more than one b OR vice versa.
Suppose the input is aaaabbbaaaaab
Then the output using regex would be aaaabbb
and aaaaab
How can I get the output to be
aaaabbb
bbbaaaaa
aaaaab
Backslashes in Java. The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.
- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"
We can use a backslash to escape characters. We require two backslashes as backslash is itself a character and needs to be escaped. Characters after \\ are escaped. It is generally used to escape characters at the end of the string.
Using special characters For example, to match a single "a" followed by zero or more "b" s followed by "c" , you'd use the pattern /ab*c/ : the * after "b" means "0 or more occurrences of the preceding item."
Try this way
String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
System.out.println(m.group(1));
This regex uses look around mechanisms and will find (a+b+|b+a+)
that
^
of the inputb
that is predicted by a
a
that is predicted by b
.Output:
aaaabbb
bbbaaaaa
aaaaab
Is
^
essentially needed in this regular expression?
Yes, without ^
this regex wouldn't capture aaaabbb
placed at start of input.
If I wouldn't add (^|(?<=a)b|(?<=b)a)
after (?=(a+b+|b+a+))
this regex would match
aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab
so I needed to limit this results to only these that starts with a
that has b
before it (but not include b
in match - so look behind was perfect for that) and b
that is predicted by a
.
But lets not forget about a
or b
that are placed at start of the string and are not predicted by anything. To include them we can use ^
.
Maybe it will be easier to show this idea with this regex
(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a)
.
(?<=^|a)b
will match b
that is placed at start of string, or has a
before it(?<=^|b)a
will match a
that is placed at start of string, or has b
before itYou can simulate this with lookbehind:
((?<=a)b+|(?<=b)a+)
This outputs
bbb aaaaa b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With