The SLACVX Batch system
Contents
See also, recent changes.
The Batch system on SLACVX is designed to
- Allow optimum spread of jobs across the various machines
in the SLACVX cluster taking into account the different uses
and capabilities of these machines.
- Allow optimum use of scarce resources such as tape drives and
tapes on the SLACVX cluster.
- Allow for machines to be used for both MC production running
and as user analysis machines.
- Allow for different jobs mixes at different times of day/week.
The VMS batch system provides most of the capabilities needed
to achieve these goals.
SLD has added a small amount of extra
functionality to address the following
limitations of the VMS batch system.
- VMS batch does not take into account differing CPU power of
different machines (i.e. a job which takes 1 hour on a VAX 4000 will
only take 30 minutes on an ALPHA).
- Over reliance on the VMS scheduling priority results in low
priority jobs languishing without any CPU while tying up valuable
resources such as tape drives and memory.
- VMS batch does not provide adequate facilities for specifying
relative priorities of jobs, e.g. this queue should run user analysis jobs,
but if there are no analysis jobs then it should run reconstruction jobs,
but if there are no reconstruction jobs it should run MC jobs.
In addition the SLD batch system provides some improved tools for monitoring
and controlling the batch queues. The design of the system is modeled on
the hallowed SLACVM batch system
(said to have been designed in the good old days of SLAC computing know-how).
Jobs are submitted using the standard VMS
SUBMIT command.
All jobs should be submitted to one of the following queues:
Queue Time Limit
============ ==========
SLD_EXPRESS 2 minutes
SLD_FAST 8 minutes
SLD_SHORT 20 minutes
SLD_MEDIUM 1 hour
SLD_LONG 3 hours
SLD_CRUNCH Infinite
SLD_STAGE Reserved for production use
SLD_MC Reserved for production use
A process called JOB_JUGGLER will then move your job from one of
these queues to an appropriate execution queue (sending you a message
if you specified /NOTIFY on your submit command).
You can control the execution of your job by specifying CHARACTERISTICS
when you submit the job. Important characteristics for regular users are:
- CART
- MUST be specified if your job will access cartridges
- VAX
- If you would like to prevent your job from running
on an ALPHA.
You specify these characteristics as follows:
$ SUBMIT/QUEUE=SLD_LONG/CHAR=VAX
$ SUBMIT/QUEUE=SLD_MEDIUM/CHAR=(VAX,CART)
Other characteristics allow you to control which node your job will run on,
e.g.:
- ALPHA
- Only run on ALPPHA
- VAX
- Only run on a VAX (ie NOT an ALPHA)
- FDDI
- Only run on a machine that has FDDI access
- JNET
- Only run on a machine that has JNET access
- SLACVX, SLDA1,
SLDB1, SLDB2,
SLDB3, SLDB4,
SLDB5, SLDB6, NOSLACVX
- Force job to run on a specific node.
All the normla VMS commands for querying jobs and queues can of course
still be used, but an improved command BATQ is also made available as
part of the SLD batch system, which provides a relatively susinct summary
of all the jobs running or queued in the system, together with how
much CPU time each job has so far used. The command can be used as follows:
- BATQ
- shows all jobs
- BATQ/USER=USHER
- shows all of Tracy's jobs
- BATQ/PENDING
- only shows pending jobs
- BATQ/IGNORE=SLDPENDING
- ignores those zillions of SLDMCM pending jobs
- BATQ/CLASS=X
- show all jobs in class X
The BATQ command is now implemented as a C program
(thanks to David
Williams)
Under the SLD batch system the queues which users submit their jobs to are
all generic queues which feed into a dummy exectution queue which is never
started. In the absense of any outside intervention users' jobs would sit in
these queues forever.
The core of the SLD batch system is a single DCL command file
(JOB_JUGGLER.COM) which runs as a batch job in each of the execution
queues.
This implementation method was chosen for speed of implementation and ease of maintainence.
The JOB_JUGGLER always has the lowest queuing priority so any other job
in the queue will go ahead of it, but if the execution queue has an empty slot the
JOB_JUGGLER will start and will then begin searching for eligible jobs to run.
The system is controlled by three sets of configuration files, the first set consists of the
execution configuration file EXECUTION.config
which list all of the execution queues in which job_juggler is to run, and the
generic configuration file GENERIC.config.
This lists all of the generic queues that the system is to manage, assigns
a one letter job class to each queue, and sets a maximum CPU time that jobs
submitted in each queue can use.
Next are the day configuration files:
- WEEKDAY.config
- used on weekdays
- WEEKEND.config
- used on weekends
- HOLIDAY.config
- used on holidays
These configuration files specify which execution configuration file the job_juggler should
use at different times. Currently this boils down to one of:
- PEAK.config
- used during peak times
- NON_PEAK.config
- used during non-peak times
- TRANSITION.config
- useed during transition from non-peak to peak
These files list each of the execution queues and specify the order in which job_juggler is to search
the job classes (= generic queues) when looking for jobs eligible to run in these queues. In addition
these files specify relative CPU powers for each queue (not yet implemented) and optionally a superseded queue
(see later).
When searching for eligible jobs job_juggler searches the generic queues in the order specified. To be
eligible to be run a job must satisfy the following requirements:
- The job must not request any CHARACTERISTICS that the queue does not possess.
- The minimum of the generic queue time limit and any time limit specified when the
job was submitted must not exceed the time limit for the execution queue multiplied
by the relative CPU power for the queue.
If an eligible job is found it is moved into the execution queue, then the job_juggler
resubmits itself and terminates. If no eleigible jobs are found the job_juggler waits
a certain time (currently four minutes) and then wakes up and looks for more jobs. After
50 unsucessful attempts to find an eligible job the job_juggler resubmits itself and then
exits (to avoid problems with infintely large log files or CPU time limits).
One special case concerns the CART charactersistic. If the execution queue the job_juggler is running
in has the CART characteristic then job_juggler first checks to see if there are any free tape drives.
If not then it proceeds as if the queue did not have the CART characteristic (i.e. it will not
accept any jobs which require CART). If there are free tape drives it then checks that both the
DCSC and
SETUP systems are functioning correctly. If it finds that either system is bust it
- Sends a whining messsage (currently to TONYJ and CXGSYS).
- Proceeds as if the queue did not have the CART characteristic.
This system is designed to attempt to prevent jobs which need tapes from starting when
they obviously cannot run. It is not perfect however, since the job_juggler does not know
how many tape drives a job will actually use, nor is there any mechanism to prevent two
jobs starting simultaneously when there is only one drive avilable. This is not fatal
however since SLD jobs wait if they need resources until these resources become available.
Finally if job_juggler moves a job into an execution queue that has a
superceded queue specified then the superceded queue
is stopped, resulting in any jobs running in that queue being suspended. The queue
is only restarted when there are no longer any jobs running or eligible to be run in the
execution queue. This allows (for example) user jobs to completely replace MC jobs at
short notice, giving good turnaround for user jobs. (Note that suspended processes are
the first processes VMS swaps out of memory if physical memory is short).
Job_juggler keeps a log of all jobs whiuch it moves as
job_juggler.log.
This log file records the owner, class, submission time and start time
of each job.
At some point in the future an analysis program may be written to generate
statistics from this log.
There are two tools for system maintainers:
- CHECKQ.COM
- Produces a listing of each queue on the cluster, showing
such things as maximum CPU time allowed, node, whether job_juggler is running
etc. for each queue.
- SETQ.COM
- Scans each queue and sets appropriate characteristics as well
as starting job_juggler on any queue it should be running on but is not.
- The relative cpu waitings have to actually be implemented (easy)
- A way of enforcing the CART characteristic to be specified for
jobs which use tapes would be nice (i.e. a way to kill jobs which attempt
to use tapes but which did not specify /CHAR=CART). (done!)
- Some sort of anti-flooding mechanism a laVM would be nice to stop
Homer
from using all the exectution queue. (Experimental system now in place,
see Appendix.)
- A way of requeuing jobs which need a resource (tape, tape drive etc.)
which is currently unavailable would be nice. (hard - needs mods to SETUP)
- It would be nice if the BATQ command worked faster (this would probably
require rewriting it as a c program instead of a COM file) (done!).
Tony Johnson
February 1994
Updated August 1994