I would like to replace only the group in parenthesis in this expression :
my_string.gsub(/<--MARKER_START-->(.)*<--MARKER_END-->/, 'replace_text')
so that I get : <--MARKER_START-->replace_text<--MARKER_END-->
I know I could repeat the whole MARKER_START
and MARKER_END
blocks in the substitution expression but I thought there should be a more simple way to do this.
You can do it with zero width look-ahead and look-behind assertions.
This regex should work in ruby 1.9 and in perl and many other places:
Note: ruby 1.8 only supports look-ahead assertions. You need both look-ahead and look-behind to do this properly.
s.gsub( /(?<=<--MARKER START-->).*?(?=<--MARKER END-->)/, 'replacement text' )
What happens in ruby 1.8 is the ?<=
causes it to crash because it doesn't understand the look-behind assertion. For that part, you then have to fall back to using a backreference - like Greig Hewgill mentions
so what you get is
s.gsub( /(<--MARKER START-->).*?(?=<--MARKER END-->)/, '\1replacement text' )
I've replaced the (.)*
in the middle of your regex with .*?
- this is non-greedy.
If you don't have non-greedy, then your regex will try and match as much as it can - if you have 2 markers on one line, it goes wrong. This is best illustrated by example:
"<b>One</b> Two <b>Three</b>".gsub( /<b>.*<\/b>/, 'BOLD' )
=> "BOLD"
What we actually want:
"<b>One</b> Two <b>Three</b>".gsub( /<b>.*?<\/b>/, 'BOLD' )
=> "BOLD Two BOLD"
zero-width-look-ahead-assertion sounds like a giant pile of nerdly confusion.
What "look-ahead-assertion" actually means is "Only match, if the thing we are looking for, is followed by this other stuff.
For example, only match a digit, if it is followed by an F.
"123F" =~ /\d(?=F)/ # will match the 3, but not the 1 or the 2
What "zero width" actually means is "consider the 'followed by' in our search, but don't count it as part of the match when doing replacement or grouping or things like that. Using the same example of 123F, If we didn't use the lookahead assertion, and instead just do this:
"123F" =~ /\dF/ # will match 3F, because F is considered part of the match
As you can see, this is ideal for checking for our <--MARKER END-->
, but what we need for the <--MARKER START-->
is the ability to say "Only match, if the thing we are looking for FOLLOWS this other stuff". That's called a look-behind assertion, which ruby 1.8 doesn't have for some strange reason..
Hope that makes sense :-)
PS: Why use lookahead assertions instead of just backreferences? If you use lookahead, you're not actually replacing the <--MARKER-->
bits, only the contents. If you use backreferences, you are replacing the whole lot. I don't know if this incurs much of a performance hit, but from a programming point of view it seems like the right thing to do, as we don't actually want to be replacing the markers at all.
You could do something like this:
my_string.gsub(/(<--MARKER_START-->)(.*)(<--MARKER_END-->)/, '\1replace_text\3')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With