Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is "$1" ending up in my Regex.Replace() result?

Tags:

c#

regex

I am trying to write a regular expression to rewrite URLs to point to a proxy server.

bodystring = Regex.Replace(bodystring, "(src='/+)", "$1" + proxyStr);

The idea of this expression is pretty simple, basically find instances of "src='/" or "src='//" and insert a PROXY url at that point. This works in general but occasionally I have found cases where a literal "$1" will end up in the result string.

This makes no sense to me because if there was no match, then why would it replace anything at all?

Unfortunately I can't give a simple example of this at it only happens with very large strings so far, but I'd like to know conceptually what could make this sort of thing happen.

As an aside, I tried rewriting this expression using a positive lookbehind as follows:

bodystring = Regex.Replace(bodystring, "(?<=src='/+)", proxyStr);

But this ends up with proxyStr TWICE in the output if the input string contains "src='//". This also doesn't make much sense to me because I thought that "src=" would have to be present in the input twice in order to get proxyStr to end up twice in the output.

like image 388
Locksleyu Avatar asked Dec 08 '11 15:12

Locksleyu


3 Answers

When proxyStr = "10.15.15.15:8008/proxy?url=http://", the replacement string becomes "$110.15.15.15:8008/proxy?url=http://". It contains a reference to group number 110, which certainly does not exist.

You need to make sure that your proxy string does not start in a digit. In your case you can do it by not capturing the last slash, and changing the replacement string to "$1/"+proxyStr, like this:

bodystring = Regex.Replace(bodystring, "(src='/*)/", "$1/" + proxyStr);

Edit:

Rawling pointed out that .NET's regexp library addresses this issue: you can enclose 1 in curly braces to avoid false aliasing, like this:

bodystring = Regex.Replace(bodystring, "(src='/+)", "${1}" + proxyStr);
like image 175
Sergey Kalinichenko Avatar answered Oct 23 '22 23:10

Sergey Kalinichenko


What you are doing can't be done. .NET has trouble when interpolating variable like this. Your problem is that your Proxy string starts with a number : proxyStr = "10.15.15.15:8008/proxy?url=http://"

When you combine this with your $1, the regex thing it has to look for backreference $110 which doesn't exist.

See what I mean here.

You can remedy this by matching something else, or by matching and constructing the replacement string manually etc. Use what suits you best.

like image 30
FailedDev Avatar answered Oct 23 '22 22:10

FailedDev


Based on dasblinkenlights answer (already +1) the solution is this:

bodystring = Regex.Replace(bodystring, "(src='/+)", "${1}" + proxyStr);

This ensures that the group 1 is used and not a new group number is build.

like image 34
stema Avatar answered Oct 24 '22 00:10

stema