+A  Click here to enlarge/reduce to/back from full screen 
Paul Scherrer Institut PSI AIT LINUX Support

PSI Home |  LOG Home |  AIT Home |  Search |  Helpdesk |  Contact Us | 





Sun Grid Engine Jobs


Quick Start

Login to merlin00. Add the Sun Grid Engine module sge/n1ge6 if needed:

  module list
  module add sge/n1ge6
  module list
  Currently Loaded Modulefiles:
    1) sge/n1ge6

Test it by running the qconf command to show the list of execution hosts and the list of queues:

  qconf -sel
  qconf -sql
  man qconf

Create a simple job script simple_env.sge:

#!/bin/bash
### Change to the current working directory:
#$ -cwd
### Job name:
#$ -N simple_array
MY_HOST=`hostname`
MY_DATE=`date`
echo "Running on $MY_HOST at $MY_DATE"
echo "Running environment:"
env
echo "================================================================"
# Put your single-CPU script here

Submit it to the default queue:

  qsub simple_env.sge
  qstat
  man qsub

The STDOUT and STDERR will be written to the files simple_env.o$JOB_ID and simple_env.e$JOB_ID where $JOB_ID is the job ID.

   ls -lA simple_env.o*
   ls -lA simple_env.e*

Edit the sample script to start your single-CPU program. Submit it to the all.q queue:

  qsub -q all.q my_job.sge
  qstat

Resource Requirements

Use the command qconf -sql to list all queues, e.g.:

  qconf -sql
all.q
background.q
long.q
test.q
The default queue is all.q. You should normally use the default queue for production jobs and the test queue test.q, which has a short time limit (1 hour, maximum 8 CPUs), for development.

To show the configuration of a queue use the command qconf -sq queue_name, e.g.:

  qconf -sq all.q
qname                 all.q
hostlist              @production
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi mpi_fill_up pvm
rerun                 FALSE
slots                 4,[merlin11=4],[merlin10=4],[merlin09=4],[merlin08=4], \
                      [merlin07=4],[merlin05=4],[merlin04=4],[merlin03=4], \
                      [merlin02=4],[merlin01=4],[merlin12=4],[merlin13=4], \
                      [merlin14=4],[merlin06=4],[merlin15=4],[merlin16=4], \
                      [merlin17=4],[merlin18=4],[merlin19=4],[merlin20=4], \
                      [merlin21=4],[merlin22=4],[merlin23=4],[merlin24=4]
tmpdir                /tmp
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      background.q
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  48:00:00  # Soft run time limit in hh:mm:ss
h_rt                  48:30:00  # Hard run time limit in hh:mm:ss
s_cpu                 24:00:00  # Soft CPU time limit in hh:mm:ss
h_cpu                 24:30:00  # Hard CPU time limit in hh:mm:ss
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

You may wish to specify the time limits for your job using the -l option of the qsub command. For example, set the soft run time limit to 10 min and the hard run time to 11 min for a job running in the queue teat.q:

  qsub -l s_rt=00:10:00,h_rt=00:11:00 -q test.q my_job.sge
You may specify several resources using comma sepatated key-valur pairs in the key=value format (no spaces, quote strings if needed).

 
 


Monitoring and Controlling Jobs

In progress ...

   qstat
   qstat -f
   qstat -ext
   qstat -pri
   qstat -urg
   qstat -g c
   qstat -pri
   qstat -q queue
   qstat -j job -explain

   qstat -j job

   qmod -sj job 
   qmod -usj job 

   qalter -R y job 

In progress ...

   qconf -ssconf
   qconf -sstree

       urg   =  rrcontr  wtcontr  dlcontr
        |        |        |        |-- deadline contribution
        |        |        |-- waiting time contribution
        |        |-- resource requirement contribution
        |-- urgency

Change Job Priority

Change a job priority:

   qalter -p value job 
   qalter -p  500  12345 
The change will be seen in qstat after a while.

  man qstat
  man qmod
  man qalter

  man qconf
  man sched_conf


Accounting

The qacct utility scans the accounting data file (see accounting(5)) and produces a summary of information for wall-clock time, cpu-time, and system time for the categories of hostname, queue-name, group-name, owner-name, job-name, job-ID and for the queues meeting the resource requirements as specified with the -l switch. Combinations of each category are permitted. Alternatively, all or specific jobs can be listed with the -j switch. For example the search criteria could include summarizing for a queue and an owner, but not for two queues in the same request.

  man qacct
  qacct -b MMDDhhmm [-e MMDDhhmm]  
  qacct -o [owner]
  qacct -slots
Examples:
  qacct -o -b 02010000 -e 02220000
  qacct -o -slots


References


 
 


Last modified: 2007-02-22 V.M.