Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash script to iterate files in directory and pattern match filenames

Tags:

bash

ubuntu

I need to process a large number of files in a directory. The files can be partitioned into several groups, based upon the file names. That is to say, the file names can be pattern matchedne which 'group' they belong to. For instance, the names are like this:

  • YYYYMMDD_*_bulk_import.csv
  • YYYYMMDD_*_genstats_import.csv
  • YYYYMMDD_*allstats.csv

etc ...

Each 'group' has a different processing methodology (i.e. a different command is called for processing).

I want to write a bash script to:

  1. Iterate through all CSV files in the directory
  2. Determine which 'group' a file belongs to by pattern matching its name to known patterns (like the examples I gave above)
  3. Call a command based on the determined grouping.

I am running on Ubuntu 10.0.4. I am new to bash, and would appreciate skeleton code snippet that will help me get started in writing this script.

like image 594
Homunculus Reticulli Avatar asked Jun 25 '12 08:06

Homunculus Reticulli


People also ask

How can I iterate over files in a given directory Bash?

The syntax to loop through each file individually in a loop is: create a variable (f for file, for example). Then define the data set you want the variable to cycle through. In this case, cycle through all files in the current directory using the * wildcard character (the * wildcard matches everything).

How do you loop a file in Linux?

To loop through a directory, and then print the name of the file, execute the following command: for FILE in *; do echo $FILE; done.

How do I list files in a directory in Bash?

To see a list of all subdirectories and files within your current working directory, use the command ls . In the example above, ls printed the contents of the home directory which contains the subdirectories called documents and downloads and the files called addresses. txt and grades.


2 Answers

The easiest way is probably just to iterate each group separately. This side-steps the parsing issue entirely.

DIRECTORY=.  for i in $DIRECTORY/YYYYMMDD_*_bulk_import.csv; do     # Process $i done  for i in $DIRECTORY/YYYYMMDD_*_genstats_import.csv; do     # Process $i done  for i in $DIRECTORY/YYYYMMDD_*allstats.csv; do     # Process $i done 

Set DIRECTORY to whatever directory you want to search. The default . will search the current working directory.

like image 95
cdhowie Avatar answered Oct 14 '22 17:10

cdhowie


Here is basic iteration over files, with switch block to determine file type.

#!/bin/bash for f in *; do         case $f in                  [0-9]*_bulk_import.csv)                         echo $f case 1                         ;;                 [0-9]*_genstats_import.csv)                         echo $f case 2                         ;;                 [0-9]*allstats.csv)                         echo $f case 3                         ;;         esac done 
like image 30
jazgot Avatar answered Oct 14 '22 15:10

jazgot