We have a folder with 50 datafiles (next-gen DNA sequences) that need to be converted by running a python script on each one. The script takes 5 hours per file and it is single threaded and is largely CPU bound (the CPU core runs at 99% with minimal disk IO).
Since I have a 4 core machine, I'd like to run 4 instances of this script at once to vastly speed up the process.
I guess I could split the data into 4 folders and in run the following bash script on each folder at the same time:
files=`ls -1 *`
for $file in $files;
do
out = $file+=".out"
python fastq_groom.py $file $out
done
But there must be a better way of running it on the one folder. We can use Bash/Python/Perl/Windows to do this.
(Sadly making the script multi threaded is beyond what we can do)
Using @phs xargs solution was the easiest way for us to solve the problem. We are however requesting the original developer implements @Björn answer. Once again thanks!
How do I run a Python script in all files in a directory? Use the Command Prompt to Execute a Command on Every File That Is Present in a Folder in Python. Use the os Module to Execute a Command on Every File in a Folder in Python. Use the pathlib Module to Execute a Command on Each File in a Folder in Python.
Yes, you can run multiple python scripts at once and In python, we use multi-threading to run multiple works simultaneously. The simplest solution to run two Python processes concurrently is to run them from a bash file, and tell each process to go into the background with the & shell operator.
For running dynamically all the python program files in a given folder <FOLDER_NAME>, we can run a bash script file for doing this task. With the help of this above script, We can run all . py extension file which is located in the given folder path. With each iteration, This program will run every python file.
You can use the multiprocessing
-module. I suppose you have a list of files to process and a function to call for each file. Then you could simply use a worker-pool like this:
from multiprocessing import Pool, cpu_count
pool = Pool(processes=cpu_count)
pool.map(process_function, file_list, chunksize=1)
If your process_function
doesn't return a value, you can simply ignore the return-value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With