I am not a programmer and hence simple answers will be appreciated. I am a MD and am involved in a bioinformatics project.
Let's say I have a Python script, abc.py
and I have a text file, commandline.txt
with 113 command lines, 1 in each line, for this script to be run in parallel. I want each of these jobs to be run in its own directory called scatter.001, scatter.002, ... , scatter.113, (just a unique number for each), to be created in the directory where I am executing the script from.
I am running, Windows 7 with Python 2.7.
What is the command line for doing this? (python xyz\abc.py ....... )
PS:
-p 100 -m 10000000 -e 10 -k I:\Exome\Invex\analyses\PatientSet.load_maf.pkl ,UBE2Q1,RNF17,RNF10,REM1,PMM2,ZNF709,ZNF708,ZNF879,DISC1,RPL37,ZNF700,ZNF707,CAMK4,ZC3H10,ZC3H13,RNF115,ZC3H14,SPN,HMGCLL1,CEACAM5,GRIN1,DHX8,NUP98,XPC,SP4,SP5,CAMKV,SPPL3,RAB40C,RAB40A,COL7A1,GTSE1,OVCH1,FAM183B,KIAA0831,SPPL2B,ITGA8,ITGA9,MYO3B,ATP2A2,ITGA1,ITGA2,ITGA3,ITGA5,RIT1,ITGA7,TRHR,LOC100132288,DENND4A,DENND4B,TAP2,GAP43,PAMR1,HRH2,HRH3,HRH1,FBXL18,FAM169B,GHDC,SDK1,SDK2,THSD4,THSD1,ZFP161,CHST8,COL4A5,COL4A4,COL4A3,COL4A2,COL4A1,CHST1,CHST5,CHST4,ITGAX I:\Exome\Invex\analyses\First7.final_analysis_set.maf I:\Exome\Invex\temp\unzipped_power_files First7 I:\Exome\Invex\analyses\First7.individual_set.txt I:\Exome\Invex\hg19.fasta I:\Exome\Invex\hg19_encoded_by_trinucleotide.fasta I:\Exome\Invex\TCGA.hg19.June2011.gaf I:\Exome\Invex\hg19 I:\Exome\Invex\pph2_whpss_reduced I:\Exome\Invex\cosmic_num_times_each_chr_pos_mutated.tab
That is an example of one line in commandline.txt. I have 113 such lines, in the file..
If you go this way, you're getting into windows shell programming, which nobody does. (I mean somebody does it, but they're an extremely small group.)
It would be simplest if you wrote a second python script that loops through the arguments that you want to pass to the second script, and calls a functoin with those arguments.
from subprocess import Popen
from os import mkdir
argfile = open('commandline.txt')
for number, line in enumerate(argfile):
newpath = 'scatter.%03i' % number
mkdir(newpath)
cmd = '../abc.py ' + line.strip()
print 'Running %r in %r' % (cmd, newpath)
Popen(cmd, shell=True, cwd=newpath)
This creates a directory, and runs your command as a separate process in that directory. Since it doesn't wait for the subprocess to finish before starting another, this gives the paralellism you want.
The in-series version just waits before it starts another subprocess. Add one line at the end of the loop:
p = Popen(cmd, shell=True, cwd=newpath)
p.wait()
This python script should do it in parallel:
import os, subprocess
n = 0
for cmd in open('commandline.txt'):
newpath = 'scatter.%03d' % n
os.mkdir(newpath)
subprocess.Popen("..\\abc.py " + cmd, shell=True, cwd=newpath)
n += 1
Note that this assumes abc.py and commandline.txt are in the same directory. If this was not the case, you would have to update the string to something like "C:\\path\\to\\abc.py"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With