Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

excluding nodes from qsub command under sge

I have more than 200 jobs I need to submit to and sge cluster. I'll be submitting them into two ques. One of the ques have a machine that I don't want to submit jobs to. How can I exclude that machine? The only thing I found that might be helpful is (assuming three valid nodes available to q1 and all the available nodes for q2 are valid):

qsub -q q1.q@n1 q1.q@n2 q1.q@n3 q2.q
like image 814
Yotam Avatar asked Dec 13 '12 13:12

Yotam


People also ask

What is SGE QSUB?

Qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below.


2 Answers

Assuming you don't want to run it on is called n4 then adding the following to your script should work.

#$ -l h=!n4

If you add the -l option to the qsub command line rather than embedding it in the submitted script most shells would require the exclamation mark to be quoted.

like image 158
William Hay Avatar answered Sep 22 '22 06:09

William Hay


The best way I've found for this is to set up a custom resource on the nodes that you want to allow the execution on, then require that resource when you submit the job.

In qmon, go to the "complex" configuration and add a new attribute. Set the name to something like "my_allowed" and the shortcut to something like "m_a", the type to BOOL, the relation to ==, requestable to Yes, consumable to No, and "Add" it. Commit your changes to the complex configurations.

The next step is probably easier to do from the command line, but you can do it in qmon, as well. You need to add your consumable to each host that you're going to allow your job to run on. In qmon, you can go to the host configuration, select execution host, and open each host in turn, clicking on the consumables/fixed attributes tab and adding the new complex that you just configured above with "True" as the value. From the command line, you can get a list of your execution hosts with "qconf -sel". This list is suitable for passing to a loop and grepping out the host(s) you don't want included. Do something like this:

qconf -sel | grep -v host_to_exclude | while read host; do
    EDITOR="ed" qconf -me $h <<EOL
/complex_values/s/$/,my_test=True/
w
q
EOL
done

This lets you programmatically edit the host (not normally allowed by qconf as it wants to start up your editor for you). It does this by setting the editor to "ed" (you'll have to make sure you have the ed editor installed... try running it by hand first... type "q" to get out). ed takes the list of editing commands on it's stdin, so we give it three commands. The first edits the line with the complex_values on it to include the my_test value. The second writes out the temporary file and the third quits ed.

Once you've done this, submit your jobs with a limit option that requires your new complex:

qsub -q whatever -l my_test=True my_prog.sh

The -l option sets a limit and the my_test=True says the job can only run on hosts that have the complex my_test with a value of True. Since the complex isn't consumable, it can still run as many jobs on each host as it wants to (up to the slot limit for the hosts), but it will avoid any hosts that don't have the my_test complex set to True.

like image 42
jlp Avatar answered Sep 24 '22 06:09

jlp