This is my script:
#!/bin/bash
#script to loop through directories to merge fastq files
sourcedir=/path/to/source
destdir=/path/to/dest
for f in $sourcedir/*
do
fbase=$(basename "$f")
echo "Inside $fbase"
zcat $f/*R1*.fastq.gz | gzip > $destdir/"$fbase"_R1.fastq.gz
zcat $f/*R2*.fastq.gz | gzip > $destdir/"$fbase"_R2.fastq.gz
done
Here there are about 30 sub-directories in the directory 'source'. Each sub-directory has certain R1.fastq.gz files and R2.fastq.gz that I want to merge into one R1.fastq.gz and R2.fastq.gz file, then save the merged file to the destination directory. My code works fine but I need to speed it up because of the amount of data. I just want to know is there any way I can implement multi threaded programming in my script? How can I run my script so that multiple jobs run in parallel? New to bash scripting, so any help would be appreciated.
Parallel executes Bash scripts in parallel via a concept called multi-threading. This utility allows you to run different jobs per CPU instead of only one, cutting down on time to run a script.
Concurrency and Parallelism In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run concurrently on a separate processor, resulting in parallel execution, which is true simultaneous execution.
The simplest way is to execute the commands in the background, by adding &
to the end of the command:
#!/bin/bash
#script to loop through directories to merge fastq files
sourcedir=/path/to/source
destdir=/path/to/dest
for f in $sourcedir/*
do
fbase=$(basename "$f")
echo "Inside $fbase"
zcat $f/*R1*.fastq.gz | gzip > $destdir/"$fbase"_R1.fastq.gz &
zcat $f/*R2*.fastq.gz | gzip > $destdir/"$fbase"_R2.fastq.gz &
done
From the bash manual:
If a command is terminated by the control operator ‘&’, the shell executes the command asynchronously in a subshell. This is known as executing the command in the background. The shell does not wait for the command to finish, and the return status is 0 (true). When job control is not active (see Job Control), the standard input for asynchronous commands, in the absence of any explicit redirections, is redirected from /dev/null.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With