I'm running a job on several different servers (up to 25) using GNU parallel.
The shell script which implements this currently does:
parallel --tag --nonall -S $some_list_of_servers "some_command"
state=$?
echo -n "RESULT: "
if [ "$state" -eq "0" ]
then
echo "All jobs successful"
else
echo "$state jobs failed"
fi
return $state
where some_list_of_servers is an array, and install_command is, for instance, git fetch.
What I want is LOT more information than just how many jobs failed. I want to know which command, and which server, failed.
I've been through the man page, and google, and SO but can't find the switch(es) that I'm looking for.
Any help gratefully appreciated.
WeeDom
EDIT in response to Answer 1:
I tried that, and something odd is happening.
weedom@host1: ~/$ parallel --tag --nonall -j8 --joblog test.log -S host1,host2 uptime
host2 10:41:17 up 36 days, 20:45, 1 user, load average: 0.00, 0.00, 0.00
host1 10:41:17 up 22:34, 3 users, load average: 0.06, 0.11, 0.04
weedom@host1: ~/$ cat test.log
Seq Host Starttime Runtime Send Receive Exitval Signal Command
1 host1 1403689277.067 0.519999980926514 0 0 0 0 uptime
No matter how many hosts I add to -S, I seem to only get the last one to complete into test.log
I've added a follow-up question here: GNU Parallel - --joblog only logging last job
You want to use the --joblog
option, as shown in the docs. Gnu parallel even allows restarting just the failed ones with --resume-failed
.
eg, running this script:
#!/bin/bash
jobmod=$(( $1 % 3 ))
if [ $jobmod == 0 ]
then
exit 1
else
exit 0
fi
on several hosts like this:
$ seq 1 10 | parallel --joblog out.log -S "srv01,srv02,srv03,srv04" ./failjob
gives
$ more out.log
Seq Host Starttime Runtime Send Receive Exitval Signal Command
1 srv01 1403542514.713 0.267 0 0 0 0 ./failjob 1
3 srv02 1403542514.717 0.266 0 0 1 0 ./failjob 3
4 srv03 1403542514.719 0.266 0 0 0 0 ./failjob 4
2 srv04 1403542514.715 0.397 0 0 0 0 ./failjob 2
5 srv01 1403542514.983 0.231 0 0 0 0 ./failjob 5
6 srv02 1403542514.986 0.368 0 0 1 0 ./failjob 6
7 srv03 1403542514.988 0.388 0 0 0 0 ./failjob 7
8 srv04 1403542515.121 0.437 0 0 0 0 ./failjob 8
9 srv01 1403542515.221 0.343 0 0 1 0 ./failjob 9
10 srv02 1403542515.356 0.388 0 0 0 0 ./failjob 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With