Sun Grid Engine Jobs
Quick Start
Login to merlin00. Add the Sun Grid Engine module sge/n1ge6 if needed:
module list
module add sge/n1ge6
module list
Currently Loaded Modulefiles:
1) sge/n1ge6
Test it by running the qconf command to show the list of execution hosts and the list of queues:
qconf -sel qconf -sql man qconf
Create a simple job script simple_env.sge:
#!/bin/bash ### Change to the current working directory: #$ -cwd ### Job name: #$ -N simple_array MY_HOST=`hostname` MY_DATE=`date` echo "Running on $MY_HOST at $MY_DATE" echo "Running environment:" env echo "================================================================" # Put your single-CPU script here
Submit it to the default queue:
qsub simple_env.sge qstat man qsub
The STDOUT and STDERR will be written to the files simple_env.o$JOB_ID and simple_env.e$JOB_ID where $JOB_ID is the job ID.
ls -lA simple_env.o* ls -lA simple_env.e*
Edit the sample script to start your single-CPU program. Submit it to the all.q queue:
qsub -q all.q my_job.sge qstat
Resource Requirements
Use the command qconf -sql to list all queues, e.g.:
qconf -sql all.q background.q long.q test.qThe default queue is all.q. You should normally use the default queue for production jobs and the test queue test.q, which has a short time limit (1 hour, maximum 8 CPUs), for development.
To show the configuration of a queue use the command qconf -sq queue_name, e.g.:
qconf -sq all.q
qname all.q
hostlist @production
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make mpi mpi_fill_up pvm
rerun FALSE
slots 4,[merlin11=4],[merlin10=4],[merlin09=4],[merlin08=4], \
[merlin07=4],[merlin05=4],[merlin04=4],[merlin03=4], \
[merlin02=4],[merlin01=4],[merlin12=4],[merlin13=4], \
[merlin14=4],[merlin06=4],[merlin15=4],[merlin16=4], \
[merlin17=4],[merlin18=4],[merlin19=4],[merlin20=4], \
[merlin21=4],[merlin22=4],[merlin23=4],[merlin24=4]
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode unix_behavior
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list background.q
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt 48:00:00 # Soft run time limit in hh:mm:ss
h_rt 48:30:00 # Hard run time limit in hh:mm:ss
s_cpu 24:00:00 # Soft CPU time limit in hh:mm:ss
h_cpu 24:30:00 # Hard CPU time limit in hh:mm:ss
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
You may wish to specify the time limits for your job using the -l option of the qsub command. For example, set the soft run time limit to 10 min and the hard run time to 11 min for a job running in the queue teat.q:
qsub -l s_rt=00:10:00,h_rt=00:11:00 -q test.q my_job.sgeYou may specify several resources using comma sepatated key-valur pairs in the key=value format (no spaces, quote strings if needed).
Monitoring and Controlling Jobs
In progress ...
qstat qstat -f qstat -ext qstat -pri qstat -urg qstat -g c qstat -pri qstat -q queue qstat -j job -explain qstat -j job qmod -sj job qmod -usj job qalter -R y job
In progress ...
qconf -ssconf
qconf -sstree
urg = rrcontr wtcontr dlcontr
| | | |-- deadline contribution
| | |-- waiting time contribution
| |-- resource requirement contribution
|-- urgency
Change Job Priority
Change a job priority:
qalter -p value job qalter -p 500 12345The change will be seen in qstat after a while.
man qstat man qmod man qalter man qconf man sched_conf
Accounting
The qacct utility scans the accounting data file (see accounting(5)) and produces a summary of information for wall-clock time, cpu-time, and system time for the categories of hostname, queue-name, group-name, owner-name, job-name, job-ID and for the queues meeting the resource requirements as specified with the -l switch. Combinations of each category are permitted. Alternatively, all or specific jobs can be listed with the -j switch. For example the search criteria could include summarizing for a queue and an owner, but not for two queues in the same request.
man qacct qacct -b MMDDhhmm [-e MMDDhhmm] qacct -o [owner] qacct -slotsExamples:
qacct -o -b 02010000 -e 02220000 qacct -o -slots
References
Last modified: 2007-02-22 V.M.
