Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to feed a large array of commands to GNU Parallel?

I'm evaluating if GNU Parallel can be used to search files stored on a system in parallel. There can be only one file for each day of year (doy) on the system (so a maximum of 366 files per year). Let's say there are 3660 files on the system (about 10 years worth of data). The system could be a multi-CPU multi-core Linux or a multi-CPU Solaris.

I'm storing the search commands to run on the files in an array (one command per file). And this is what I'm doing right now (using bash) but then I have no control on how many searches to start in parallel (definitely don't want to start all 3660 searches at once):

#!/usr/bin/env bash
declare -a cmds
declare -i cmd_ctr=0

while [[ <condition> ]]; do
    if [[ -s $cur_archive_path/log.${doy_ctr} ]]; then
      cmds[$cmd_ctr]="<cmd_to_run>"
      let cmd_ctr++
    fi
done

declare -i arr_len=${#cmds[@]}
for (( i=0; i<${arr_len}; i++ ));
do
  # Get the command and run it in background
  eval ${cmds[$i]} &
done
wait

If I were to use parallel (which will automatically figure out the max. CPUs/cores and start only so many searches in parallel), how can I reuse the array cmds with parallel and rewrite the above code? The other alternative is to write all commands to a file and then do cat cmd_file | parallel

like image 312
Say No To Censorship Avatar asked May 07 '13 19:05

Say No To Censorship


1 Answers

https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Using-shell-variables says:

parallel echo ::: "${V[@]}"

You do not want the echo, so:

parallel ::: "${cmds[@]}"

If you do not need $cmds for anything else, then use 'sem' (which is an alias for parallel --semaphore) https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-mutex-and-counting-semaphore

while [[ <condition> ]]; do
  if [[ -s $cur_archive_path/log.${doy_ctr} ]]; then
    sem -j+0 <cmd_to_run>
  fi
done
sem --wait

You have not described what <condition> might be. If you are simply doing a something like a for-loop you could replace the whole script with:

parallel 'if [ -s {} ] ; then cmd_to_run {}; fi' ::: $cur_archive_path/log.{1..3660}

(based on https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Composed-commands).

like image 137
Ole Tange Avatar answered Nov 04 '22 00:11

Ole Tange