Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching in a Bash if statement

People also ask

How do you check if a string matches a regex in bash?

You can use the test construct, [[ ]] , along with the regular expression match operator, =~ , to check if a string matches a regex pattern (documentation). where commands after && are executed if the test is successful, and commands after || are executed if the test is unsuccessful.

Can you use regex in bash?

Regex is a very powerful tool that is available at our disposal & the best thing about using regex is that they can be used in almost every computer language. So if you are Bash Scripting or creating a Python program, we can use regex or we can also write a single line search query.

CAN YOU DO IF statements in regex?

If-Then-Else Conditionals in Regular Expressions. A special construct (? ifthen|else) allows you to create conditional regular expressions. If the if part evaluates to true, then the regex engine will attempt to match the then part.

What is bash if [- N?

A null string in Bash can be declared by equalizing a variable to “”. Then we have an “if” statement followed by the “-n” flag, which returns true if a string is not null. We have used this flag to test our string “name,” which is null.


There are a couple of important things to know about bash's [[ ]] construction. The first:

Word splitting and pathname expansion are not performed on the words between the [[ and ]]; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed.

The second thing:

An additional binary operator, ‘=~’, is available,... the string to the right of the operator is considered an extended regular expression and matched accordingly... Any part of the pattern may be quoted to force it to be matched as a string.

Consequently, $v on either side of the =~ will be expanded to the value of that variable, but the result will not be word-split or pathname-expanded. In other words, it's perfectly safe to leave variable expansions unquoted on the left-hand side, but you need to know that variable expansions will happen on the right-hand side.

So if you write: [[ $x =~ [$0-9a-zA-Z] ]], the $0 inside the regex on the right will be expanded before the regex is interpreted, which will probably cause the regex to fail to compile (unless the expansion of $0 ends with a digit or punctuation symbol whose ascii value is less than a digit). If you quote the right-hand side like-so [[ $x =~ "[$0-9a-zA-Z]" ]], then the right-hand side will be treated as an ordinary string, not a regex (and $0 will still be expanded). What you really want in this case is [[ $x =~ [\$0-9a-zA-Z] ]]

Similarly, the expression between the [[ and ]] is split into words before the regex is interpreted. So spaces in the regex need to be escaped or quoted. If you wanted to match letters, digits or spaces you could use: [[ $x =~ [0-9a-zA-Z\ ] ]]. Other characters similarly need to be escaped, like #, which would start a comment if not quoted. Of course, you can put the pattern into a variable:

pat="[0-9a-zA-Z ]"
if [[ $x =~ $pat ]]; then ...

For regexes which contain lots of characters which would need to be escaped or quoted to pass through bash's lexer, many people prefer this style. But beware: In this case, you cannot quote the variable expansion:

# This doesn't work:
if [[ $x =~ "$pat" ]]; then ...

Finally, I think what you are trying to do is to verify that the variable only contains valid characters. The easiest way to do this check is to make sure that it does not contain an invalid character. In other words, an expression like this:

valid='0-9a-zA-Z $%&#' # add almost whatever else you want to allow to the list
if [[ ! $x =~ [^$valid] ]]; then ...

! negates the test, turning it into a "does not match" operator, and a [^...] regex character class means "any character other than ...".

The combination of parameter expansion and regex operators can make bash regular expression syntax "almost readable", but there are still some gotchas. (Aren't there always?) One is that you could not put ] into $valid, even if $valid were quoted, except at the very beginning. (That's a Posix regex rule: if you want to include ] in a character class, it needs to go at the beginning. - can go at the beginning or the end, so if you need both ] and -, you need to start with ] and end with -, leading to the regex "I know what I'm doing" emoticon: [][-])


In case someone wanted an example using variables...

#!/bin/bash

# Only continue for 'develop' or 'release/*' branches
BRANCH_REGEX="^(develop$|release//*)"

if [[ $BRANCH =~ $BRANCH_REGEX ]];
then
    echo "BRANCH '$BRANCH' matches BRANCH_REGEX '$BRANCH_REGEX'"
else
    echo "BRANCH '$BRANCH' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
fi

I'd prefer to use [:punct:] for that. Also, a-zA-Z09-9 could be just [:alnum:]:

[[ $TEST =~ ^[[:alnum:][:blank:][:punct:]]+$ ]]

Or you might be looking at this question because you happened to make a silly typo like I did and have the =~ reversed to ~=