Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java String.replaceAll() with back reference

Tags:

java

regex

There is a Java Regex question: Given a string, if the "*" is at the start or the end of the string, keep it, otherwise, remove it. For example:

  1. * --> *
  2. ** --> **
  3. ******* --> **
  4. *abc**def* --> *abcdef*

The answer is:

str.replaceAll("(^\\*)|(\\*$)|\\*", "$1$2");

I tried the answer on my machine and it works. But I don't know how it works.

From my understanding, all matched substrings should be replaced with $1$2. However, it works as:

  1. (^\\*) replaced with $1,
  2. (\\*$) replaced with $2,
  3. \\* replaced with empty.

Could someone explain how it works? More specifically, if there is | between expressions, how String.replaceAll() works with back reference?

Thank you in advance.

like image 951
Jeffrey Avatar asked Mar 28 '16 17:03

Jeffrey


People also ask

What is replaceAll \\ s in Java?

Java String replaceAll() The replaceAll() method replaces each substring that matches the regex of the string with the specified text.

Does replaceAll replace original string?

The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function to be called for each match. The original string is left unchanged.

What is the difference between Replace () and replaceAll ()?

The difference between replace() and replaceAll() method is that the replace() method replaces all the occurrences of old char with new char while replaceAll() method replaces all the occurrences of old string with the new string.


3 Answers

I'll try to explain what's happening in regex.

str.replaceAll("(^\\*)|(\\*$)|\\*", "$1$2");

$1 represents first group which is (^\\*) $2 represents 2nd group (\\*$)

when you call str.replaceAll, you are essentially capturing both groups and everything else but when replacing, replace captured text with whatever got captured in both groups.

Example: *abc**def* --> *abcdef*

Regex is found string starting with *, it will put in $1 group, next it will keep looking until it find * at end of group and store it in #2. now when replacing it will eliminate all * except one stored in $1 or $2

For more information see Capture Groups

like image 67
Saleem Avatar answered Oct 16 '22 16:10

Saleem


You can use lookarounds in your regex:

String repl = str.replaceAll("(?<!^)\\*+(?!$)", "");

RegEx Demo

RegEx Breakup:

(?<!^)   # If previous position is not line start
\\*+     # match 1 or more *
(?!$)    # If next position is not line end

OP's regex is:

(^\*)|(\*$)|\*

It uses 2 captured groups, one for * at start and another for * at end and uses back-references in replacements. Which might work here but will be way more slower to finish for larger string as evident in # of steps taken in this demo. That is 209 vs 48 steps using look-arounds.

Another smaller improvement in OP's regex is to use quantifier:

(^\*)|(\*$)|\*+
like image 21
anubhava Avatar answered Oct 16 '22 16:10

anubhava


Well, let's first take a look at your regex (^\\*)|(\\*$)|\\* - it matches every *, if it is at the start, it is captured into group 1, if it is at the end, it is captured into group 2 - every other * is matched, but not put into any group.

The Replace pattern $1$2 replaces every single match with the content of group 1 and group 2 - so in case of a * at the beginning or the end of a match, the content of one of the groups is that * itself and is therefore replaced by itself. For all the other matches, the groups contain only empty strings, so the matched * is replaced with this empty string.

Your problem was probably that $1$2 is not a literal replace, but a backreference to captured groups.

like image 2
Sebastian Proske Avatar answered Oct 16 '22 15:10

Sebastian Proske