I had a small script where I would source into each openstack's tenant and fetch some output with the help of python. It took too long for the reports to get generated and I was suggested to use <code>xargs</code>. My earlier code was like below. <pre class="prettyprint"><code>#!/bin/bash cd /scripts/cloud01/floating_list rm -rf ./reports/openstack_reports/ mkdir -p ./reports/openstack_reports/ source ../creds/base for tenant in A B C D E F G H I J K L M N O P Q R S T do source ../creds/$tenant python ../tools/openstack_resource_list.py > ./reports/openstack_reports/$tenant.html done lftp -f ./lftp_script </code></pre> Now I have put xargs in the script and the script looks something like this. <pre class="prettyprint"><code>#!/bin/bash cd /scripts/cloud01/floating_list rm -rf ./reports/openstack_reports/ mkdir -p ./reports/openstack_reports/ source ../creds/base # Need xargs idea below cat tenants_list.txt | xargs -P 8 -I '{}' # something that takes the tenant name and source TENANT_NAME={} python ../tools/openstack_resource_list.py > ./reports/openstack_reports/$tenant.html lftp -f ./lftp_script </code></pre> In this script how am I supposed to implement <code>source ../creds/$tenant</code>? Because while each tenant is dealt with, it needs to be sourced as well and I am not sure how to include that with xargs for parallel execution.

<code>xargs</code> can't easily run a shell function ... but it can run a shell. <pre class="prettyprint"><code># If the tenant names are this simple, don't put them in a file printf '%s\n' {A..T} | xargs -P 8 -I {} bash -c 'source ../creds/"$0" python ../tools/openstack_resource_list.py > ./reports/openstack_reports/"$0".html' {} </code></pre> Somewhat obscurely, the argument after <code>bash -c '...'</code> gets exposed as <code>$0</code> inside the script. If you want to keep the tenants in a file, <code>xargs -a filename</code> is a good way to avoid the useless use of <code>cat</code>, though it's not portable to all <code>xargs</code> implementations. (Redirecting with <code>xargs ... <filename</code> is obviously completely portable.) For efficiency, you could refactor the script to loop over as many arguments as possible: <pre class="prettyprint"><code>printf '%s\n' {A..T} | xargs -n 3 -P 8 bash -c 'for tenant; do source ../creds/"$tenant" python ../tools/openstack_resource_list.py > ./reports/openstack_reports/"$tenant".html done' _ </code></pre> This will run a maximum of 8 parallel shell instances with a maximum of 3 tenants assigned to each (so in actual fact only 7 instances), though with this small number of arguments, the difference in performance is probably negligible. Because we are now actually receiving a list of arguments, we pass <code>_</code> as the value to populate <code>$0</code> with (just because it needs to be set to something, in order to get the real arguments in place properly). If the <code>source</code> might make modifications which are not always guaranteed to be overwritten by the <code>source</code> in the next iteration (say, some tenants have variables which need to be unset for some other tenants?) that complicates matters, but maybe post a separate question if you really actually need help resolving that; or just fall back to the first variant where each tenant is run in a separate shell instance.

Parallel processing with xargs in bash

Tags:

I had a small script where I would source into each openstack's tenant and fetch some output with the help of python. It took too long for the reports to get generated and I was suggested to use xargs. My earlier code was like below.

#!/bin/bash
cd /scripts/cloud01/floating_list

rm -rf ./reports/openstack_reports/
mkdir -p ./reports/openstack_reports/

source ../creds/base
for tenant in A B C D E F G H I J K L M N O P Q R S T
do
  source ../creds/$tenant
  python ../tools/openstack_resource_list.py > ./reports/openstack_reports/$tenant.html

done
lftp -f ./lftp_script

Now I have put xargs in the script and the script looks something like this.

#!/bin/bash
cd /scripts/cloud01/floating_list

rm -rf ./reports/openstack_reports/
mkdir -p ./reports/openstack_reports/

source ../creds/base

# Need xargs idea below
cat tenants_list.txt | xargs -P 8 -I '{}' # something that takes the tenant name and source
TENANT_NAME={}
python ../tools/openstack_resource_list.py > ./reports/openstack_reports/$tenant.html
lftp -f ./lftp_script

In this script how am I supposed to implement source ../creds/$tenant? Because while each tenant is dealt with, it needs to be sourced as well and I am not sure how to include that with xargs for parallel execution.

496

asked Dec 22 '17 07:12

Heenashree Khandelwal

1 Answers

xargs can't easily run a shell function ... but it can run a shell.

# If the tenant names are this simple, don't put them in a file
printf '%s\n' {A..T} |
xargs -P 8 -I {} bash -c 'source ../creds/"$0"
      python ../tools/openstack_resource_list.py > ./reports/openstack_reports/"$0".html' {}

Somewhat obscurely, the argument after bash -c '...' gets exposed as $0 inside the script.

If you want to keep the tenants in a file, xargs -a filename is a good way to avoid the useless use of cat, though it's not portable to all xargs implementations. (Redirecting with xargs ... <filename is obviously completely portable.)

For efficiency, you could refactor the script to loop over as many arguments as possible:

printf '%s\n' {A..T} |
xargs -n 3 -P 8 bash -c 'for tenant; do
      source ../creds/"$tenant"
      python ../tools/openstack_resource_list.py > ./reports/openstack_reports/"$tenant".html
  done' _

This will run a maximum of 8 parallel shell instances with a maximum of 3 tenants assigned to each (so in actual fact only 7 instances), though with this small number of arguments, the difference in performance is probably negligible.

Because we are now actually receiving a list of arguments, we pass _ as the value to populate $0 with (just because it needs to be set to something, in order to get the real arguments in place properly).

If the source might make modifications which are not always guaranteed to be overwritten by the source in the next iteration (say, some tenants have variables which need to be unset for some other tenants?) that complicates matters, but maybe post a separate question if you really actually need help resolving that; or just fall back to the first variant where each tenant is run in a separate shell instance.

answered Oct 11 '22 14:10

tripleee

Related questions
                            
                                Angular 4 focus on item on arrow down and scroll
                            
                                What is the difference between hibernate session's getNamedQuery(String name) and createNamedQuery(String name)?
                            
                                Is it possible to map object value types in TypeScript?
                            
                                Can't resolve/use System.ServiceModel.Security.WSTrustServiceContract as service name
                            
                                Tensorflow: Keras, Estimators and custom input function
                            
                                Theme not picked up by all Activity views
                            
                                Wrong device-orientation returned
                            
                                How to run all doctests in a folder recursively in pycharm?
                            
                                Closest distance between lat/longs in large dataset in excel vba
                            
                                Define skewed gaussian function that returns two parameters after fitting
                            
                                How to find PixelFormat for decoded bitmap using SkiaSharp
                            
                                Spark: Dataframe Serialization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With