Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions in java with a variable

I have a variable v that possibly appears more than one time consecutively in a string. I want to make it so that all consecutive vs turn into just one v. For example:

String s = "Hello, world!";
String v = "l";

The regex would turn "Hello, world!" into "Helo, world!"

So I want to do something like

s = s.replaceAll(vv+, v)

But obviously that won't work. Thoughts?

like image 635
rhombidodecahedron Avatar asked Nov 28 '22 08:11

rhombidodecahedron


2 Answers

Let's iteratively develop the solution; in each step we point out what the problems are and fix it until we arrive at the final answer.

We can start with something like this:

String s = "What???? Impo$$ible!!!";
String v = "!";

s = s.replaceAll(v + "{2,}", v);
System.out.println(s);
// "What???? Impo$$ible!"

{2,} is the regex syntax for finite repetition, meaning "at least 2 of" in this case.

It just so happen that the above works because ! is not a regex metacharacter. Let's see what happens if we try the following:

String v = "?";

s = s.replaceAll(v + "{2,}", v);
// Exception in thread "main" java.util.regex.PatternSyntaxException:       
// Dangling meta character '?'

One way to fix the problem is to use Pattern.quote so that v is taken literally:

s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
System.out.println(s);
// "What? Impo$$ible!!!"

It turns out that this isn't the only thing we need to worry about: in replacement strings, \ and $ are also special metacharacters. That explains why we get the following problem:

String v = "$";
s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
// Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
// String index out of range: 1

Since we want v to be taken literally as a replacement string, we use Matcher.quoteReplacement as follows:

s = s.replaceAll(Pattern.quote(v) + "{2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "What???? Impo$ible!!!"

Lastly, repetition has higher precedence than concatenation. This means the following:

System.out.println(  "hahaha".matches("ha{3}")    ); // false
System.out.println(  "haaa".matches("ha{3}")      ); // true
System.out.println(  "hahaha".matches("(ha){3}")  ); // true

So if v can contain multiple characters, you'd want to group it before applying the repetition. You can use a non-capturing group in this case, since you don't need to create a backreference.

String s = "well, well, well, look who's here...";
String v = "well, ";
s = s.replaceAll("(?:" +Pattern.quote(v)+ "){2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "well, look who's here..."

Summary

  • To match an arbitrary literal string that may contain regex metacharacters, use Pattern.quote
  • To replace with an arbitrary literal string that may contain replacement metacharacters, use Matcher.quoteReplacement

References

  • java.util.regex.Pattern
  • java.util.regex.Matcher
  • regular-expressions.info
    • Finite Repetition

Bonus material

The following example uses reluctant repetition, capturing group and backreferences mixed with case-insensitive matching:

    System.out.println(
        "omgomgOMGOMG???? Yes we can! YES WE CAN! GOAAALLLL!!!!"
            .replaceAll("(?i)(.+?)\\1+", "$1")
    );
    // "omg? Yes we can! GOAL!"

Related questions

  • Regex to match tags like <A>, <BB>, <CCC> but not <ABC>

References

  • regular-expressions.info/Brackets for Grouping and Backreference
like image 144
polygenelubricants Avatar answered Dec 17 '22 03:12

polygenelubricants


Use x{2,} to match x at least twice.

To be able to replace characters with special meanings for regexps, you'd use Pattern.quote:

String part = Pattern.quote(v);
s = s.replaceAll(part + "{2,}", v);

To replace things longer than one character, use non-capturing groups:

String part = "(?:" + Pattern.quote(v) + ")";
s = s.replaceAll(part + "{2,}", v);
like image 42
gustafc Avatar answered Dec 17 '22 03:12

gustafc