I have a shell script say data.sh. For this script to execute I will pass a single argument say Table_1.
I have a test file which I will get as a result of a different script.
Now in a test file I have more than 1000 arguments to pass to the script.
The file looks like below:
Table_1
Table_2
Table_3
Table_4
and..so..on
Now I want to execute the script to run in parallel.
I am doing this using cron job.
First I am splitting the test file into 20 parts Using the split command in Linux.
split -l $(($(wc -l < test )/20 + 1)) test
I will then have the test file divided to 20 parts such as xaa,xab,xac and so on.
Then run the cron job:
* * * * * while IFS=',' read a;do /home/XXXX/data.sh $a;done < /home/xxxx/xaa
* * * * * while IFS=',' read a;do /home/XXXX/data.sh $a;done < /home/xxxx/xab
and so on.
As this involves lot of manual process. I would like to do this dynamically.
Here is what I want to achieve:
1) As soon as I get the test file I would like it to be split into say 20 files automatically and store at a particular place.
2) Then I would like to schedule the cron job for every day 5 Am by passing the 20 files as arguments to the script.
What is the best way to implement this? Any answers with explanation will be appreciated.
Here is what you could do. Create two cron jobs:
file_splitter.sh -> splits the file and stores them in a particular directoryfile_processer.sh -> picks up one file at a time from the directory above, does a read loop, and calls data.sh. Removes the file after successful processing.Schedule file_splitter.sh to run ahead of file_processor.sh.
If you want to achieve further parallelism, you can make file_splitter.sh write the split files into multiple directories with a few files in each. Let's say they are called sub1, sub2, etc. Then, you can schedule multiple instances of file_processor.sh and pass the sub directory name as an argument. Since the split files are stored in separate directories, we can ensure that only one job processes the files in a particular subdirectory.
It's better to keep the cron command as simple as possible.
* * * * * /path/to/file_processor.sh
is better than
* * * * * while IFS=',' read a;do /home/XXXX/data.sh $a;done < /home/xxxx/xab
Makes sense?
I had written a post about how to manage cron jobs effectively. You may want to take a look at it:
Managing log files created by cron jobs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With