Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash Regular Expression Condition

I have a regular expression that I need to verify. The regular expression has double quotes in it, but I can't seem to figure out how to properly escape them.

First attempt, doesn't work as the quotes are not escaped.

while read line
do
  if [[ $line =~ "<a href="(.+)">HTTP</a>" ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {"link.html"} (with double quotes)

How can I properly run this so the output is link.html without double quotes.

I have tried...

while read line
do
  if [[ $line =~ "<a href=/"(.+)/">HTTP</a>" ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {}

Without luck. Can someone please help me so I can stop beating my head on my desk? I am not great with Bash. Thank you!

like image 201
jayem Avatar asked Mar 03 '11 20:03

jayem


People also ask

What does =~ mean in bash?

A regular expression matching sign, the =~ operator, is used to identify regular expressions. Perl has a similar operator for regular expression corresponding, which stimulated this operator.

How do you check if a string matches a regex in bash?

You can use the test construct, [[ ]] , along with the regular expression match operator, =~ , to check if a string matches a regex pattern (documentation). where commands after && are executed if the test is successful, and commands after || are executed if the test is unsuccessful.

How do I check if a string is empty in bash?

To find out if a bash variable is empty: Return true if a bash variable is unset or set to the empty string: if [ -z "$var" ]; Another option: [ -z "$var" ] && echo "Empty" Determine if a bash variable is empty: [[ ! -z "$var" ]] && echo "Not empty" || echo "Empty"

What is =~?

The =~ operator is a regular expression match operator. This operator is inspired by Perl's use of the same operator for regular expression matching.


1 Answers

It's always best to put your regex in a variable.

pattern='<a href="(.+)">HTTP</a>'
while read line
do
  if [[ $line =~ $pattern ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {link.html} (without double quotes)

If you quote the right hand side (the pattern), it changes the match from regex to a simple string equal (=~ effectively becomes ==).

As a side note, escaping is done with backslashes (\) rather than slashes (/), but that would not help your situation because of the outer quotes as mentioned in my previous paragraph.

like image 145
Dennis Williamson Avatar answered Oct 29 '22 13:10

Dennis Williamson