Replacing partial regex matches in place with Ruby

Question

I want to transform the following text

This is a ![foto](foto.jpeg), here is another ![foto](foto.png)

into

This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)

In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.

I would like to do this using String#gsub in its block version. Currently my code looks like this:

re = /!$$.*?$$$(.*?)$/

rel_content = content.gsub(re) do |path|
    real_path(path)
end

The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!$$.*?$$$)(.*?)(?>$) but to no avail.

My current workaround is to split the path and reassemble it later.

Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?

Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.

In this case the solution would be

re = /(!$$.*?$$$)(.*?)($)/

rel_content = content.gsub(re) do
    $1 + real_path($2) + $3
end

Marek Příhoda · Accepted Answer

A quick solution (adjust as necessary):

s = 'This is a ![foto](foto.jpeg)'

s.sub!(/!($$.*?$$)$(.*?)$/, '\1(/folder1/\2)' )

p s  # This is a [foto](/folder1/foto.jpeg)

NaN · Answer

As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?

It's easy. Put a bracket around something else.

For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".

I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice everything else does not have a definite format.

The answer is to put brackets around something else, then protect them:

"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.') 
# => "a-ruby-porgramming-book.png"

If you want add something before matched content, you can use:

"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"

Dominik Honnef · Answer

In your block, use $1 to access the first capture group ($2 for the second and so on).

From the documentation:

In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.

Carl Suster · Answer

You can always do it in two steps - first extract the whole image expression out and then second replace the link:

str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"

str.gsub(/\!$$[^$$]*\]$([^)]*)$/) do |image|
  image.gsub(/(?<=$)(.*)(?=$)/) do |link|
    "/a/new/path/" + link
  end
end

#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"

I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.

[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):

You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).

In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.

Obviously you can put real_path in from here, but I just wanted a test-able example.

I think that this approach is more flexible and more readable than reconstructing the string through the match group variables

Replacing partial regex matches in place with Ruby

Tags:

regex

replace

markdown

ruby

gioele

4 Answers

Marek Příhoda

NaN

Dominik Honnef

Carl Suster

Recent Activity

Donate For Us

Replacing partial regex matches in place with Ruby

Tags:

regex

replace

markdown

ruby

gioele

4 Answers

Marek Příhoda

NaN

Dominik Honnef

Carl Suster

Related questions

Recent Activity

Donate For Us