Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caret will not work in Bash regular expression?

Tags:

regex

bash

caret

I'm trying to match lines corresponding to image placements in Markdown files, so I can replace the address of each image, where appropriate, with a value from an array. The lines look like this:

![Alt text.](/!/image.jpg)

Note, the image address itself, within the parentheses, contains an exclamation mark as this indicates it needs to be replaced with a real address. So image.jpg acts as the key for an array I have created.

Say the value for the key image.jpg is http://images.com/an-example-image.jpg. The desired result for my Bash script would be:

![Alt text.](http://images.com/an-example-image.jpg) 

I've been using a conditional operator in Bash to do this...

testfile=$(<test-md.md)
re='(.*)\!(.*\()\/\!\/([0-9a-z\.\-]+)(\).*)'
while [[ $testfile =~ $re ]]; do
    testfile=${BASH_REMATCH[1]}"!"${BASH_REMATCH[2]}${imagemap[${BASH_REMATCH[3]}]}${BASH_REMATCH[4]}
done

So far so good.

But I don't want to capture these lines like this if they're part of a blockquote or code, only those that would be parsed by Markdown as an actual image.

I thought I could avoid this by insisting that the exclamation mark that begins the image placement be at the very start of the line. Here's the regular expression I've tried:

re='(.*)^\!(.*\()\/\!\/([0-9a-z\.\-]+)(\).*)'

Unfortunately, Bash doesn't seem to want to recognise the caret when I do this. The replacement still works but even if the line is in code, it gets replaced. For example, this Markdown file:

![Alt text.](/!/image.jpg)

This image was placed with the following code:

    ![Alt text.](/!/image.jpg)

Unfortunately becomes this:

![Alt text.](http://images.com/an-example-image.jpg)

This image was placed with the following code:

    ![Alt text.](http://images.com/an-example-image.jpg)

It should be this:

![Alt text.](http://images.com/an-example-image.jpg)

This image was originally placed with the following code:

    ![Alt text.](/!/image.jpg)

I've also tried using line break character class instead of the caret:

re='(.*)[\n\r]+\!(.*\()\/\!\/([0-9a-z\.\-]+)(\).*)'

That doesn't work either, so I could be I've missed something important about Bash regular expressions in general.

Am I using the caret incorrectly in this case? How can I capture just those instances where the image placement starts at the beginning of a line?

like image 500
guypursey Avatar asked May 03 '26 23:05

guypursey


1 Answers

Thanks to Avinsah Raj in the comments for giving me the clue to this one. I couldn't see it at first but there seems to be no way to make the Kleene star in Bash regex non-greedy. (Happy to be corrected if this is wrong.)

I found that if I altered the regex so that we look for printable characters only after the first exclamation mark and prior to the opening parenthesis, then the capture works. It must have been too wide before and capturing line breaks to find an earlier exclamation mark on a previous unrelated line.

So the correct regex is:

re='(.*^\!\[[[:print:]]+\]\()\/\!\/([0-9a-z\.\-]+)(\).*)'

With this in place, the caret works and only image placements at the start of a line are found and replaced accordingly.

This has been driving me mad all afternoon, so many thanks Avinsah!

like image 91
guypursey Avatar answered May 05 '26 11:05

guypursey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!