There is a Java Regex question: Given a string, if the "*" is at the start or the end of the string, keep it, otherwise, remove it. For example:
*
--> *
**
--> **
*******
--> **
*abc**def*
--> *abcdef*
The answer is:
str.replaceAll("(^\\*)|(\\*$)|\\*", "$1$2");
I tried the answer on my machine and it works. But I don't know how it works.
From my understanding, all matched substrings should be replaced with $1$2
. However, it works as:
(^\\*)
replaced with $1
,(\\*$)
replaced with $2
,\\*
replaced with empty.Could someone explain how it works? More specifically, if there is |
between expressions, how String.replaceAll()
works with back reference?
Thank you in advance.
Java String replaceAll() The replaceAll() method replaces each substring that matches the regex of the string with the specified text.
The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function to be called for each match. The original string is left unchanged.
The difference between replace() and replaceAll() method is that the replace() method replaces all the occurrences of old char with new char while replaceAll() method replaces all the occurrences of old string with the new string.
I'll try to explain what's happening in regex.
str.replaceAll("(^\\*)|(\\*$)|\\*", "$1$2");
$1
represents first group which is (^\\*)
$2
represents 2nd group (\\*$)
when you call str.replaceAll
, you are essentially capturing both groups and everything else but when replacing, replace captured text with whatever got captured in both groups.
Example: *abc**def* --> *abcdef*
Regex is found string starting with *
, it will put in $1
group, next it will keep looking until it find *
at end of group and store it in #2
. now when replacing it will eliminate all *
except one stored in $1
or $2
For more information see Capture Groups
You can use lookarounds in your regex:
String repl = str.replaceAll("(?<!^)\\*+(?!$)", "");
RegEx Demo
RegEx Breakup:
(?<!^) # If previous position is not line start
\\*+ # match 1 or more *
(?!$) # If next position is not line end
OP's regex is:
(^\*)|(\*$)|\*
It uses 2 captured groups, one for *
at start and another for *
at end and uses back-references in replacements. Which might work here but will be way more slower to finish for larger string as evident in # of steps taken in this demo. That is 209 vs 48 steps using look-arounds.
Another smaller improvement in OP's regex is to use quantifier:
(^\*)|(\*$)|\*+
Well, let's first take a look at your regex (^\\*)|(\\*$)|\\*
- it matches every *
, if it is at the start, it is captured into group 1, if it is at the end, it is captured into group 2 - every other *
is matched, but not put into any group.
The Replace pattern $1$2 replaces every single match with the content of group 1 and group 2 - so in case of a *
at the beginning or the end of a match, the content of one of the groups is that *
itself and is therefore replaced by itself. For all the other matches, the groups contain only empty strings, so the matched * is replaced with this empty string.
Your problem was probably that $1$2 is not a literal replace, but a backreference to captured groups.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With