Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala regex replaceAllIn can't replace when replace string looks like a regex?

Tags:

regex

scala

I've been happily running a Regex replaceAllIn for quite a while but ran into a problem when the replacement string had something that looked like a regex in it. The following illustrates the problem (Scala 2.9.1-1). Note that the real problem space is much more complex, so the idea of using a simpler solution isn't really tenable (just to preempt the inevitable "Why don't you try ..." :D)

val data = "val re = \"\"\"^[^/]*://[^/]*/[^/]*$\"\"\".r"
val source = """here
LATEX_THING{abc}
there"""
val re = "LATEX_THING\\{abc\\}".r
println(re.replaceAllIn(source, data))

This presents with the following error:

java.lang.IllegalArgumentException: Illegal group reference

If I change data from what it was to something simple like:

val data = "This will work"

Then everything's fine.

It looks like replaceAllIn is somehow looking in the second string and using it as another RE to reference what was remembered from the first RE... but the docs say nothing about this.

What am I missing?

edit: Ok, so after looking at the java.util.regex.Matcher class, it would seem that the intended fix is:

re.replaceAllIn(source, java.util.regex.Matcher.quoteReplacement(data))
like image 863
Derek Wyatt Avatar asked Mar 11 '12 20:03

Derek Wyatt


1 Answers

You need to escape the $ in your replacement string:

val data = "val re = \"\"\"^[^/]*://[^/]*/[^/]*\\$\"\"\".r"

Otherwise it's interpreted as the beginning of a group reference (which would only be valid if the $ were followed by one or more digits). See the documentation for java.util.regex.Matcher for more detail:

The replacement string may contain references to subsequences captured during the previous match: Each occurrence of $g will be replaced by the result of evaluating group(g)... A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).

Update to address your comment and edit above: Yes, you can use Matcher.quoteReplacement if you're not working with string literals (or if you are, I guess, but escaping the $ seems easier in that case), and there's at least a chance that quoteReplacement will be available as a method on scala.util.matching.Regex in the future.

like image 85
Travis Brown Avatar answered Oct 07 '22 11:10

Travis Brown