I have a variable v that possibly appears more than one time consecutively in a string. I want to make it so that all consecutive vs turn into just one v. For example:
String s = "Hello, world!";
String v = "l";
The regex would turn "Hello, world!" into "Helo, world!"
So I want to do something like
s = s.replaceAll(vv+, v)
But obviously that won't work. Thoughts?
Let's iteratively develop the solution; in each step we point out what the problems are and fix it until we arrive at the final answer.
We can start with something like this:
String s = "What???? Impo$$ible!!!";
String v = "!";
s = s.replaceAll(v + "{2,}", v);
System.out.println(s);
// "What???? Impo$$ible!"
{2,}
is the regex syntax for finite repetition, meaning "at least 2 of" in this case.
It just so happen that the above works because !
is not a regex metacharacter. Let's see what happens if we try the following:
String v = "?";
s = s.replaceAll(v + "{2,}", v);
// Exception in thread "main" java.util.regex.PatternSyntaxException:
// Dangling meta character '?'
One way to fix the problem is to use Pattern.quote
so that v
is taken literally:
s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
System.out.println(s);
// "What? Impo$$ible!!!"
It turns out that this isn't the only thing we need to worry about: in replacement strings, \
and $
are also special metacharacters. That explains why we get the following problem:
String v = "$";
s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
// Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
// String index out of range: 1
Since we want v
to be taken literally as a replacement string, we use Matcher.quoteReplacement
as follows:
s = s.replaceAll(Pattern.quote(v) + "{2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "What???? Impo$ible!!!"
Lastly, repetition has higher precedence than concatenation. This means the following:
System.out.println( "hahaha".matches("ha{3}") ); // false
System.out.println( "haaa".matches("ha{3}") ); // true
System.out.println( "hahaha".matches("(ha){3}") ); // true
So if v
can contain multiple characters, you'd want to group it before applying the repetition. You can use a non-capturing group in this case, since you don't need to create a backreference.
String s = "well, well, well, look who's here...";
String v = "well, ";
s = s.replaceAll("(?:" +Pattern.quote(v)+ "){2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "well, look who's here..."
Pattern.quote
Matcher.quoteReplacement
java.util.regex.Pattern
java.util.regex.Matcher
The following example uses reluctant repetition, capturing group and backreferences mixed with case-insensitive matching:
System.out.println(
"omgomgOMGOMG???? Yes we can! YES WE CAN! GOAAALLLL!!!!"
.replaceAll("(?i)(.+?)\\1+", "$1")
);
// "omg? Yes we can! GOAL!"
<A>
, <BB>
, <CCC>
but not <ABC>
Use x{2,}
to match x
at least twice.
To be able to replace characters with special meanings for regexps, you'd use Pattern.quote
:
String part = Pattern.quote(v);
s = s.replaceAll(part + "{2,}", v);
To replace things longer than one character, use non-capturing groups:
String part = "(?:" + Pattern.quote(v) + ")";
s = s.replaceAll(part + "{2,}", v);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With