How do I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
To insert a backslash into your regular expression pattern, use a double backslash ('\\').
The backreference \1 (backslash one) references the first capturing group. \1 matches the exact same text that was matched by the first capturing group. The / before it is a literal character. It is simply the forward slash in the closing HTML tag that we are trying to match.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "."
=~ is Ruby's basic pattern-matching operator. When one operand is a regular expression and the other is a string then the regular expression is used as a pattern to match against the string. (This operator is equivalently defined by Regexp and String so the order of String and Regexp do not matter.
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1'
or '\\\\\\1'
. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1
is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1'
and '\1'
are equivalent but '\'
and '\\'
are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1
to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1"
means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1
. It also understands \
as an escape character, so \\1
is not sufficient and will simply evaluate to the literal output \1
. So, we need \\\1
in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1
therefore requires "\\\\\\1"
. In the case of single quotes one slash can be omitted as \1
is not a valid escape sequence in single quotes and is treated literally.
One of the reasons this problem is usually hidden is thanks to the use of /.+/
style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub
replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new
:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the .
for it to be understood as a literal .
by the regexp engine, since "."
and "\."
both evaluate to .
in double-quoted strings, but we need the engine itself to receive \.
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With