I have a function (neural network model) which produces figures. I wish to test several parameters, methods and different inputs (meaning hundreds of runs of the function) from python using PBS on a standard cluster with Torque.
Note: I tried parallelpython, ipython and such and was never completely satisfied, since I want something simpler. The cluster is in a given configuration that I cannot change and such a solution integrating python + qsub will certainly benefit to the community.
To simplify things, I have a simple function such as:
import myModule
def model(input, a= 1., N=100):
do_lots_number_crunching(input, a,N)
pylab.savefig('figure_' + input.name + '_' + str(a) + '_' + str(N) + '.png')
where input
is an object representing the input, input.name
is a string, anddo_lots_number_crunching
may last hours.
My question is: is there a correct way to transform something like a scan of parameters such as
for a in pylab.linspace(0., 1., 100):
model(input, a)
into "something" that would launch a PBS script for every call to the model
function?
#PBS -l ncpus=1
#PBS -l mem=i1000mb
#PBS -l cput=24:00:00
#PBS -V
cd /data/work/
python experiment_model.py
I was thinking of a function that would include the PBS template and call it from the python script, but could not yet figure it out (decorator?).
pbs_python[1] could work for this. If experiment_model.py 'a' as an argument you could do
import pbs, os
server_name = pbs.pbs_default()
c = pbs.pbs_connect(server_name)
attopl = pbs.new_attropl(4)
attropl[0].name = pbs.ATTR_l
attropl[0].resource = 'ncpus'
attropl[0].value = '1'
attropl[1].name = pbs.ATTR_l
attropl[1].resource = 'mem'
attropl[1].value = 'i1000mb'
attropl[2].name = pbs.ATTR_l
attropl[2].resource = 'cput'
attropl[2].value = '24:00:00'
attrop1[3].name = pbs.ATTR_V
script='''
cd /data/work/
python experiment_model.py %f
'''
jobs = []
for a in pylab.linspace(0.,1.,100):
script_name = 'experiment_model.job' + str(a)
with open(script_name,'w') as scriptf:
scriptf.write(script % a)
job_id = pbs.pbs_submit(c, attropl, script_name, 'NULL', 'NULL')
jobs.append(job_id)
os.remove(script_name)
print jobs
[1]: https://oss.trac.surfsara.nl/pbs_python/wiki/TorqueUsage pbs_python
You can do this easily using jug (which I developed for a similar setup).
You'd write in file (e.g., model.py
):
@TaskGenerator
def model(param1, param2):
res = complex_computation(param1, param2)
pyplot.coolgraph(res)
for param1 in np.linspace(0, 1.,100):
for param2 in xrange(2000):
model(param1, param2)
And that's it!
Now you can launch "jug jobs" on your queue: jug execute model.py
and this will parallelise automatically. What happens is that each job will in, a loop, do something like:
while not all_done():
for t in tasks in tasks_that_i_can_run():
if t.lock_for_me(): t.run()
(It's actually more complicated than that, but you get the point).
It uses the filesystem for locking (if you're on an NFS system) or a redis server if you prefer. It can also handle dependencies between tasks.
This is not exactly what you asked for, but I believe it's a cleaner architechture to separate this from the job queueing system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With