Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use parallel programming/multi threading in my bash script?

This is my script:

#!/bin/bash
#script to loop through directories to merge fastq files
sourcedir=/path/to/source
destdir=/path/to/dest

for f in $sourcedir/*
do
    fbase=$(basename "$f")
    echo "Inside $fbase"
    zcat $f/*R1*.fastq.gz | gzip > $destdir/"$fbase"_R1.fastq.gz
    zcat $f/*R2*.fastq.gz | gzip > $destdir/"$fbase"_R2.fastq.gz
done

Here there are about 30 sub-directories in the directory 'source'. Each sub-directory has certain R1.fastq.gz files and R2.fastq.gz that I want to merge into one R1.fastq.gz and R2.fastq.gz file, then save the merged file to the destination directory. My code works fine but I need to speed it up because of the amount of data. I just want to know is there any way I can implement multi threaded programming in my script? How can I run my script so that multiple jobs run in parallel? New to bash scripting, so any help would be appreciated.

like image 730
Komal Rathi Avatar asked Aug 22 '13 15:08

Komal Rathi


People also ask

Can you multithread in bash?

Parallel executes Bash scripts in parallel via a concept called multi-threading. This utility allows you to run different jobs per CPU instead of only one, cutting down on time to run a script.

Can multiple threads run in parallel?

Concurrency and Parallelism In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run concurrently on a separate processor, resulting in parallel execution, which is true simultaneous execution.


1 Answers

The simplest way is to execute the commands in the background, by adding & to the end of the command:

#!/bin/bash
#script to loop through directories to merge fastq files
sourcedir=/path/to/source
destdir=/path/to/dest

for f in $sourcedir/*
do
    fbase=$(basename "$f")
    echo "Inside $fbase"
    zcat $f/*R1*.fastq.gz | gzip > $destdir/"$fbase"_R1.fastq.gz &
    zcat $f/*R2*.fastq.gz | gzip > $destdir/"$fbase"_R2.fastq.gz &
done

From the bash manual:

If a command is terminated by the control operator ‘&’, the shell executes the command asynchronously in a subshell. This is known as executing the command in the background. The shell does not wait for the command to finish, and the return status is 0 (true). When job control is not active (see Job Control), the standard input for asynchronous commands, in the absence of any explicit redirections, is redirected from /dev/null.

like image 178
Zero Piraeus Avatar answered Sep 29 '22 11:09

Zero Piraeus