I need the sum of an integer contained in several webpages. getPages()
parses the integer and sets it to $subTotal
. getPages()
is called in a for loop in background, but how do I get the sum of $subTotal
? Is this a subshelling problem?
This is what I've tried so far.
#!/bin/bash
total=0
getPages(){
subTotal=$(lynx -dump http://"$(printf "%s:%s" $1 $2)"/file.html | awk -F, 'NR==1 {print $1}' | sed 's/\s//g')
total=$(($total+$subTotal))
echo "SubTotal: " $subTotal "Total: " $total
}
# /output/ SubTotal: 22 Total: 22
# /output/ SubTotal: 48 Total: 48 //Note Total should be 70
ARRAY=(
'pf2.server.com:6599'
'pf5.server.com:1199'
...
)
for server in ${ARRAY[@]} ; do
KEY=${server%%:*}
VALUE=${server##*:}
getPages $KEY $VALUE &
done
wait
echo $total
exit 0
# /output/ 0
Any advice appreciated.
Yes, this is a subshelling problem. Everything executed in a ... &
list (i.e. your getPages $KEY $VALUE &
) is executed in a subshell, which means that changes of variables there do not affect the parent shell.
I think one could do something using coprocesses (i.e. communication by streams), or maybe using GNU parallel or pexec.
Here is an example with pexec
, using the default output to communicate from the single processes. I used a simpler command as the servers you listed are not accessible from here. This counts the lines on some webpages and sums them up.
ARRAY=(
'www.gmx.de:80'
'www.gmx.net:80'
'www.gmx.at:80'
'www.gmx.li:80'
)
(( total = 0 ))
while read subtotal
do
(( total += subtotal ))
echo "subtotal: $subtotal, total: $total"
done < <(
pexec --normal-redirection --environment hostname --number ${#ARRAY[*]} \
--parameters "${ARRAY[@]}" --shell-command -- '
lynx -dump http://$hostname/index.html | wc -l'
)
echo "total: $total"
We are using some tricks here:
<( ... )
) together with input redirection (<
) instead of a simple pipe.(( ... ))
arithmetic expression command. I could have used let
, instead, but then I would have to quote everything or avoid spaces. (Your total=$(( total + subtotal ))
would have worked, too.)pexec
:
--normal-redirection
means redirecting all the output streams from the subprocesses together into the output stream of pexec
. (I'm not sure this could result in some confusion if two processes want to write at the same time.)--environment hostname
passes the differing parameter for each execution as a environment variable. Otherwise it would be a simple command line argument.--number ${#ARRAY[*]}
(which gets --number 4
in our case) makes sure that the all the processes will be started in parallel, instead of only as many as we have CPUs or some other heuristic. (This is for network-roundtrip-bound work. For CPU-bound or bandwidth-bound stuff, a smaller number would be better.)--shell-command
makes sure the command will be evaluated by a shell, instead of trying to execute it directly. This is necessary because of the pipeline in there.--parameters "${ARRAY[@]}"
lists the actual arguments - i.e. the elements of the array. For each of them a separate version of the command will be started.--
comes the command - as a single '
-quoted string, to avoid premature interpretation of the $hostname
in there by the outer shell. The command simple downloads the file and pipes it to wc -l
, counting the lines.Example output:
subtotal: 1120, total: 1120
subtotal: 968, total: 2088
subtotal: 1120, total: 3208
subtotal: 1120, total: 4328
total: 4328
Here is (part of) the output of ps -f
while this is running:
2799 pts/1 Ss 0:03 \_ bash
5427 pts/1 S+ 0:00 \_ /bin/bash ./download-test.sh
5428 pts/1 S+ 0:00 \_ /bin/bash ./download-test.sh
5429 pts/1 S+ 0:00 \_ pexec --number 4 --normal-redirection --environment hostname --parame...
5430 pts/1 S+ 0:00 \_ /bin/sh -c ? lynx -dump http://$hostname/index.html | wc -l
5434 pts/1 S+ 0:00 | \_ lynx -dump http://www.gmx.de:80/index.html
5435 pts/1 S+ 0:00 | \_ wc -l
5431 pts/1 S+ 0:00 \_ /bin/sh -c ? lynx -dump http://$hostname/index.html | wc -l
5436 pts/1 S+ 0:00 | \_ lynx -dump http://www.gmx.net:80/index.html
5437 pts/1 S+ 0:00 | \_ wc -l
5432 pts/1 S+ 0:00 \_ /bin/sh -c ? lynx -dump http://$hostname/index.html | wc -l
5438 pts/1 S+ 0:00 | \_ lynx -dump http://www.gmx.at:80/index.html
5439 pts/1 S+ 0:00 | \_ wc -l
5433 pts/1 S+ 0:00 \_ /bin/sh -c ? lynx -dump http://$hostname/index.html | wc -l
5440 pts/1 S+ 0:00 \_ lynx -dump http://www.gmx.li:80/index.html
5441 pts/1 S+ 0:00 \_ wc -l
We can see that really everything runs in parallel, as much as possible on my one-processor system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With