Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run 4 concurrent instances of a python script on a folder of data files

We have a folder with 50 datafiles (next-gen DNA sequences) that need to be converted by running a python script on each one. The script takes 5 hours per file and it is single threaded and is largely CPU bound (the CPU core runs at 99% with minimal disk IO).

Since I have a 4 core machine, I'd like to run 4 instances of this script at once to vastly speed up the process.

I guess I could split the data into 4 folders and in run the following bash script on each folder at the same time:

files=`ls -1 *`
for $file in $files;
do
   out = $file+=".out" 
   python fastq_groom.py $file $out
done

But there must be a better way of running it on the one folder. We can use Bash/Python/Perl/Windows to do this.
(Sadly making the script multi threaded is beyond what we can do)


Using @phs xargs solution was the easiest way for us to solve the problem. We are however requesting the original developer implements @Björn answer. Once again thanks!

like image 663
Jon Rhoades Avatar asked Jan 23 '12 07:01

Jon Rhoades


People also ask

How do I run a python script in all files in a directory?

How do I run a Python script in all files in a directory? Use the Command Prompt to Execute a Command on Every File That Is Present in a Folder in Python. Use the os Module to Execute a Command on Every File in a Folder in Python. Use the pathlib Module to Execute a Command on Each File in a Folder in Python.

How do I run multiple instances of python?

Yes, you can run multiple python scripts at once and In python, we use multi-threading to run multiple works simultaneously. The simplest solution to run two Python processes concurrently is to run them from a bash file, and tell each process to go into the background with the & shell operator.

How do I run multiple python files in one file?

For running dynamically all the python program files in a given folder <FOLDER_NAME>, we can run a bash script file for doing this task. With the help of this above script, We can run all . py extension file which is located in the given folder path. With each iteration, This program will run every python file.


1 Answers

You can use the multiprocessing-module. I suppose you have a list of files to process and a function to call for each file. Then you could simply use a worker-pool like this:

from multiprocessing import Pool, cpu_count

pool = Pool(processes=cpu_count)
pool.map(process_function, file_list, chunksize=1)

If your process_function doesn't return a value, you can simply ignore the return-value.

like image 140
Björn Pollex Avatar answered Sep 20 '22 11:09

Björn Pollex