Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to execute thousands of commands in parallel using xargs?

Tags:

linux

bash

xargs

I'm queuing up a bunch of jobs through qsub in a loop currently

for fn in $FNS; do
    queue_job $(options_a $fn) $(options_b $fn)
done

queue_job is a script that queues up jobs using qsub and options_a/b are functions I wrote that add a few job options based on filename. I queue up to 5k jobs this way and I'd like to just add them all to the queue instantly (or in larger blocks such as 40/time) instead of in a loop.

I know I can send lines to xargs and execute them in parallel as

??? | xargs -P 40 -I{} command {}

but I'm not sure how to translate my for loop to xargs

like image 930
pmdaly Avatar asked Nov 27 '25 23:11

pmdaly


1 Answers

The qsub interface allows for submitting one job at a time - it does not provide bulk submission, which will limit the upside of submitting jobs in parallel (job submission is usually fast).

For the specific case, there are two (bash) functions (namely, options_a and options_b), which will expand to job specific parameters, based on the filename. This may limit direct execution with xargs, as suggested by the comments - the functions are unlikely to be available in the path.

Options:

Create a wrapper for queue_job that will source (or include) the functions. Use the wrapper from xargs

xargs -P40 -I{} queue_job_x1 '{}'
queue_job_x1

#! /bin/bash
function options_a {
   ...
}

function option_b {
   ...
}

queue_job $(options_a $fn) $(options_b $fn)'

Might be a good idea to put relevant functions into .sh file, which can be sourced by multiple scripts.

like image 133
dash-o Avatar answered Nov 30 '25 13:11

dash-o