I'm trying to submit jobs to SGE. It has been working for me the same way in the past. Now instead, all jobs are stuck in the qw state.
"qstat -g c" output:
> CLUSTER QUEUE CQLOAD USED AVAIL TOTAL
> all.q 0.38 0 160 1920
> gpu6.q -NA- 0 0 4
> par6.q 0.38 750 135 1800
> seq6.q 0.41 103 170 416
> smp3.q 1.01 0 0 96
"qstat" output looks like always.
Googling only gave me hints for people with root access which I don't have. Suggestions anyone?
Thanks.
Edit: Jobs were submitted via "qsub -q seq6.q scriptname" or alternatively smp3.q or par6.q.
"qstat -j jobid" gives nothing special as far as I can see:
job_number: 2821318
exec_file: job_scripts/2821318
submission_time: Wed Mar 4 12:07:15 2015
owner: username
uid: 31519
group: dch
gid: 1150
sge_o_home: /home/hudson/pg/username
sge_o_log_name: username
sge_o_path: /gpfs/hamilton6/apps/intel_comp_2014/composer_xe_2013_sp1.2.144/bin/intel64:/usr/local/bin:/bin:/usr/bin:/usr/lpp/mmfs/bin:/usr/local/Cluster-Apps/sge/6.1u6/bin/lx24-amd64:/panfs/panasas1.hpc.dur.ac.uk/apps/nag/fll6a21dpl/scripts
sge_o_shell: /bin/tcsh
sge_o_workdir: /panfs/panasas1.hpc.dur.ac.uk/username/path
sge_o_host: hamilton1
account: sge
mail_list: username@hamilton1
notify: FALSE
job_name: scriptname
jobshare: 0
hard_queue_list: seq6.q
env_list:
script_file: scriptname
scheduling info: (Collecting of scheduler job information is turned off)
I have had the same issue today. We are running Univa Grid Engine for a customer. I configured some complexes for running jobs which are requesting much memory ( h_stack=64M, memory_free=4G,virtual_free=4G) on the masterhost. After this config jobs will hang in the waiting queue. This configuration match many years with 3G on all our execution hosts. I will test this new config (4G) next days. All servers have enough memory! Ingo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With