Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing partial regex matches in place with Ruby

I want to transform the following text

This is a ![foto](foto.jpeg), here is another ![foto](foto.png)

into

This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)

In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.

I would like to do this using String#gsub in its block version. Currently my code looks like this:

re = /!\[.*?\]\((.*?)\)/

rel_content = content.gsub(re) do |path|
    real_path(path)
end

The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!\[.*?\]\()(.*?)(?>\)) but to no avail.

My current workaround is to split the path and reassemble it later.

Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?

Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.

In this case the solution would be

re = /(!\[.*?\]\()(.*?)(\))/

rel_content = content.gsub(re) do
    $1 + real_path($2) + $3
end
like image 574
gioele Avatar asked Dec 11 '11 19:12

gioele


4 Answers

A quick solution (adjust as necessary):

s = 'This is a ![foto](foto.jpeg)'

s.sub!(/!(\[.*?\])\((.*?)\)/, '\1(/folder1/\2)' )

p s  # This is a [foto](/folder1/foto.jpeg)
like image 164
Marek Příhoda Avatar answered Oct 12 '22 17:10

Marek Příhoda


As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?

It's easy. Put a bracket around something else.

For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".

I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice everything else does not have a definite format.

The answer is to put brackets around something else, then protect them:

"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.') 
# => "a-ruby-porgramming-book.png"

If you want add something before matched content, you can use:

"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"
like image 44
NaN Avatar answered Oct 12 '22 16:10

NaN


In your block, use $1 to access the first capture group ($2 for the second and so on).

From the documentation:

In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.

like image 34
Dominik Honnef Avatar answered Oct 12 '22 16:10

Dominik Honnef


You can always do it in two steps - first extract the whole image expression out and then second replace the link:

str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"

str.gsub(/\!\[[^\]]*\]\(([^)]*)\)/) do |image|
  image.gsub(/(?<=\()(.*)(?=\))/) do |link|
    "/a/new/path/" + link
  end
end

#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"

I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.

[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):

You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).

In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.

Obviously you can put real_path in from here, but I just wanted a test-able example.

I think that this approach is more flexible and more readable than reconstructing the string through the match group variables

like image 30
Carl Suster Avatar answered Oct 12 '22 16:10

Carl Suster