Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to loop through files that match a regular expression in a unix shell script

I want to be able to loop through a list of files that match a particular pattern. I can get unix to list these files using ls and egrep with a regular expression, but I cannot find a way to turn this into an iterative process. I suspect that using ls is not the answer. Any help would be gratefully received.

My current ls command looks as follows:

ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat'

I would expect the above to match:

  • MYFILE160418.dat
  • myFILE170312.DAT
  • MyFiLe160416.DaT

but not:

  • MYOTHERFILE150202.DAT
  • Myfile.dat
  • myfile.csv

Thanks,

Paul.

like image 304
paul frith Avatar asked Apr 20 '16 12:04

paul frith


People also ask

How do you write a bash shell loop over a set of files?

The syntax to loop through each file individually in a loop is: create a variable (f for file, for example). Then define the data set you want the variable to cycle through. In this case, cycle through all files in the current directory using the * wildcard character (the * wildcard matches everything).

Which command is used for regular expressions to isolate matching data?

The grep (Global Regular Expression Print) is a unix command utility that can be used to find specific patterns described in “regular expressions”, a notation which we will learn shortly. For example, the “grep” command can be used to match all lines containing a specific pattern.


2 Answers

You can use (GNU) find with the regex search option instead of parsing ls.

find . -regextype "egrep" \
       -iregex '.*/MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' \
       -exec [[whatever you want to do]] {} \;

Where [[whatever you want to do]] is the command you want to perform on the names of the files.

From the man page

-regextype type
          Changes  the regular expression syntax understood by -regex and -iregex tests 
          which occur later on the command line.  Currently-implemented types are 
          emacs (this is the default),posix-awk, posix-basic, posix-egrep and 
          posix-extended.

  -regex pattern
          File name matches regular expression pattern.  This is a match on the whole 
          path, not a search.  For example, to match a file named `./fubar3', you can 
          use the regular expression
          `.*bar.' or `.*b.*3', but not `f.*r3'.  The regular expressions understood by 
          find are by default Emacs Regular Expressions, but this can be changed with 
          the -regextype option.

  -iregex pattern
          Like -regex, but the match is case insensitive.
like image 59
123 Avatar answered Oct 24 '22 00:10

123


Based on the link Andy K provided I have used the following to loop based on my matching criteria:

for i in $(ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' ); do             
 echo item: $i;         
done
like image 36
paul frith Avatar answered Oct 24 '22 00:10

paul frith