Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if a file name matches regex in shell script

Tags:

bash

shell

I have a shell script that needs to check if a file name matches a certain regex, but it always shows "not match". Can anyone let me know what's wrong with my code?

fileNamePattern=abcd_????_def_*.txt
realFilePath=/data/file/abcd_12bd_def_ghijk.txt

if [[ $realFilePath =~ $fileNamePattern ]]
then
    echo $realFilePath match  $fileNamePattern
else
    echo $realFilePath not match $fileNamePattern
fi
like image 708
jlp Avatar asked May 04 '16 20:05

jlp


1 Answers

There is a confusion between regexes and the simpler "glob"/"wildcard"/"normal" patterns – whatever you want to call them. You're using the latter, but call it a regex.

If you want to use a pattern, you should

  • Quote it when assigning1:

      fileNamePattern="abcd_????_def_*.txt"
    

    You don't want anything to expand quite yet.

  • Make it match the complete path. This doesn't match:

      $ mypath="/mydir/myfile1.txt"
      $ mypattern="myfile?.txt"
      $ [[ $mypath == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
      Doesn't match!
    

    But after extending the pattern to start with *:

      $ mypattern="*myfile?.txt"
      $ [[ $mypath == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
      Matches!
    

    The first one doesn't match because it matches only the filename, but not the complete path. Alternatively, you could use the first pattern, but remove the rest of the path with parameter expansion:

      $ mypattern="myfile?.txt"
      $ mypath="/mydir/myfile1.txt"
      $ echo "${mypath##*/}"
      myfile1.txt
      $ [[ ${mypath##*/} == $mypattern ]]  && echo "Matches!" || echo "Doesn't match!"
      Matches!
    
  • Use == and not =~, as shown in the above examples. You could also use the more portable = instead, but since we're already using the non-POSIX [[ ]] instead of [ ], we can as well use ==.

If you want to use a regex, you should:

  • Write your pattern as one: ? and * have a different meaning in regexes; they modify what they stand after, whereas in glob patterns, they can stand on their own (see the manual). The corresponding pattern would become:

      fileNameRegex="abcd_.{4}_def_.*.txt"
    

    and could be used like this:

      $ mypath="/data/file/abcd_12bd_def_ghijk.txt"
      $ [[ $mypath =~ $fileNameRegex ]] && echo "Matches!" || echo "Doesn't match!"
      Matches!
    
  • Keep your habit of writing the regex into a separate parameter and then use it unquoted in the conditional operator [[ ]], or escaping gets very messy – it's also more portable across Bash versions.

The BashGuide has a great article about the different types of patterns in Bash.

Notice that quoting your parameters is almost always a good habit. It's not required in conditional expressions in [[ ]], and actually suppresses interpretation of the right-hand side as a pattern or regex. If you were using [ ] (which doesn't support regexes and patterns anyway), quoting would be required to avoid unexpected side effects of special characters and empty strings.


1 Not exactly true in this case, actually. When assigning to a variable, the manual says that the following happens:

[...] tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, and quote removal [...]

i.e., no pathname (glob) expansion. While in this very case using

fileNamePattern=abcd_????_def_*.txt

would work just as well as the quoted version, using quotes prevents surprises in many other cases and is required as soon as you have a blank in the pattern.

like image 172
Benjamin W. Avatar answered Oct 06 '22 01:10

Benjamin W.