Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gsub partial replace

Tags:

regex

ruby

I would like to replace only the group in parenthesis in this expression :

my_string.gsub(/<--MARKER_START-->(.)*<--MARKER_END-->/, 'replace_text')

so that I get : <--MARKER_START-->replace_text<--MARKER_END-->

I know I could repeat the whole MARKER_START and MARKER_END blocks in the substitution expression but I thought there should be a more simple way to do this.

like image 418
Pierre Olivier Martel Avatar asked Sep 23 '08 02:09

Pierre Olivier Martel


2 Answers

You can do it with zero width look-ahead and look-behind assertions.

This regex should work in ruby 1.9 and in perl and many other places:

Note: ruby 1.8 only supports look-ahead assertions. You need both look-ahead and look-behind to do this properly.

 s.gsub( /(?<=<--MARKER START-->).*?(?=<--MARKER END-->)/, 'replacement text' )

What happens in ruby 1.8 is the ?<= causes it to crash because it doesn't understand the look-behind assertion. For that part, you then have to fall back to using a backreference - like Greig Hewgill mentions

so what you get is

 s.gsub( /(<--MARKER START-->).*?(?=<--MARKER END-->)/, '\1replacement text' )

EXPLANATION THE FIRST:

I've replaced the (.)* in the middle of your regex with .*? - this is non-greedy. If you don't have non-greedy, then your regex will try and match as much as it can - if you have 2 markers on one line, it goes wrong. This is best illustrated by example:

"<b>One</b> Two <b>Three</b>".gsub( /<b>.*<\/b>/, 'BOLD' )
=> "BOLD"

What we actually want:

"<b>One</b> Two <b>Three</b>".gsub( /<b>.*?<\/b>/, 'BOLD' )
=> "BOLD Two BOLD"

EXPLANATION THE SECOND:

zero-width-look-ahead-assertion sounds like a giant pile of nerdly confusion.

What "look-ahead-assertion" actually means is "Only match, if the thing we are looking for, is followed by this other stuff.

For example, only match a digit, if it is followed by an F.

"123F" =~ /\d(?=F)/ # will match the 3, but not the 1 or the 2

What "zero width" actually means is "consider the 'followed by' in our search, but don't count it as part of the match when doing replacement or grouping or things like that. Using the same example of 123F, If we didn't use the lookahead assertion, and instead just do this:

"123F" =~ /\dF/ # will match 3F, because F is considered part of the match

As you can see, this is ideal for checking for our <--MARKER END-->, but what we need for the <--MARKER START--> is the ability to say "Only match, if the thing we are looking for FOLLOWS this other stuff". That's called a look-behind assertion, which ruby 1.8 doesn't have for some strange reason..

Hope that makes sense :-)

PS: Why use lookahead assertions instead of just backreferences? If you use lookahead, you're not actually replacing the <--MARKER--> bits, only the contents. If you use backreferences, you are replacing the whole lot. I don't know if this incurs much of a performance hit, but from a programming point of view it seems like the right thing to do, as we don't actually want to be replacing the markers at all.

like image 56
Orion Edwards Avatar answered Oct 02 '22 07:10

Orion Edwards


You could do something like this:

my_string.gsub(/(<--MARKER_START-->)(.*)(<--MARKER_END-->)/, '\1replace_text\3')
like image 44
Greg Hewgill Avatar answered Oct 02 '22 08:10

Greg Hewgill