I am trying to write a regular expression to rewrite URLs to point to a proxy server.
bodystring = Regex.Replace(bodystring, "(src='/+)", "$1" + proxyStr);
The idea of this expression is pretty simple, basically find instances of "src='/" or "src='//" and insert a PROXY url at that point. This works in general but occasionally I have found cases where a literal "$1" will end up in the result string.
This makes no sense to me because if there was no match, then why would it replace anything at all?
Unfortunately I can't give a simple example of this at it only happens with very large strings so far, but I'd like to know conceptually what could make this sort of thing happen.
As an aside, I tried rewriting this expression using a positive lookbehind as follows:
bodystring = Regex.Replace(bodystring, "(?<=src='/+)", proxyStr);
But this ends up with proxyStr TWICE in the output if the input string contains "src='//". This also doesn't make much sense to me because I thought that "src=" would have to be present in the input twice in order to get proxyStr to end up twice in the output.
When proxyStr = "10.15.15.15:8008/proxy?url=http://"
, the replacement string becomes "$110.15.15.15:8008/proxy?url=http://"
. It contains a reference to group number 110, which certainly does not exist.
You need to make sure that your proxy string does not start in a digit. In your case you can do it by not capturing the last slash, and changing the replacement string to "$1/"+proxyStr
, like this:
bodystring = Regex.Replace(bodystring, "(src='/*)/", "$1/" + proxyStr);
Edit:
Rawling pointed out that .NET's regexp library addresses this issue: you can enclose 1
in curly braces to avoid false aliasing, like this:
bodystring = Regex.Replace(bodystring, "(src='/+)", "${1}" + proxyStr);
What you are doing can't be done. .NET has trouble when interpolating variable like this. Your problem is that your Proxy string starts with a number : proxyStr = "10.15.15.15:8008/proxy?url=http://"
When you combine this with your $1
, the regex thing it has to look for backreference $110
which doesn't exist.
See what I mean here.
You can remedy this by matching something else, or by matching and constructing the replacement string manually etc. Use what suits you best.
Based on dasblinkenlights answer (already +1) the solution is this:
bodystring = Regex.Replace(bodystring, "(src='/+)", "${1}" + proxyStr);
This ensures that the group 1 is used and not a new group number is build.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With