I have a queue of thousands of shell jobs in an ordered list, and I need to run 4 jobs in parallel from the top down to avoid saturating the cpu. If I simply split the job list into 4 batch scripts, then the runtimes don't align and one of the scripts will finish well ahead of the others still have many jobs left to do. Im looking for a way to have all 4 batch jobs pull from the top of the queue the next available job.
Ive tried the bash at jobs, but it's not what im looking for
I still like to make such scripts myself :p Below a script that runs commands depending on how large the N is. Once a process exits, the entry is used to store another child pid.
run_from_file.sh
#! /bin/bash
N=4 # Amount of jobs to run in parallel
T=0 # Counter for amount of jobs
Q=() # Job queue
FILE='jobs.txt'
# Clean Q array
function _clean {
for ((i=0; i < ${N}; ++i)); do
tst=/proc/${Q[$i]}
if [ ! -d $tst ]; then
Q[$i]=0
fi
done
}
# Setup the Q
for ((i=0; i < $N; i++)); do
Q[$i]=0
done
while read -r line; do
echo $line
$line &
# Try to find an open sport (Q[i]=0)
while true; do
for ((i=0; i < ${N}; ++i)); do
if [ ${Q[$i]} -eq 0 ]; then
Q[$i]=$!
break 2
fi
done
# Clean the Q array if no free entry is found
_clean
done
((T++))
done < ${FILE}
wait
echo "Processed ($T/$(wc -l < jobs.txt)) jobs"
exit 0
jobs.txt
sleep 1s
sleep 1s
sleep 1s
sleep 1s
sleep 10s
sleep 5s
sleep 2s
sleep 2s
sleep 4s
sleep 3s
sleep 3s
sleep 3s
OLD:
I like to create such things myself because it's scalable. For instance, it allows you to do something before wait is called or you could get and store the child process IDs in a text file.
run_from_file.sh
#! /bin/bash
X=0 # Counter
N=4 # Total amount of parallel processes
FILE='jobs.txt'
while read -r line; do
echo $line
$line &
# Raise counter
((X = ++X % N))
if [ "$X" -eq 0 ]; then
echo "Waiting"
wait # Wait on processes to finish
fi
done < ${FILE}
exit 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With