The SLACVX Batch system

Purpose of the SLD Batch system on SLACVX
Submitting Jobs
Monitoring Job Progress
Technical Details
Tools for system maintainers
Work that still needs to be done

Purpose of the SLD Batch system on SLACVX

The Batch system on SLACVX is designed to

Allow optimum spread of jobs across the various machines in the SLACVX cluster taking into account the different uses and capabilities of these machines.
Allow optimum use of scarce resources such as tape drives and tapes on the SLACVX cluster.
Allow for machines to be used for both MC production running and as user analysis machines.
Allow for different jobs mixes at different times of day/week.

The VMS batch system provides most of the capabilities needed to achieve these goals. SLD has added a small amount of extra functionality to address the following limitations of the VMS batch system.

VMS batch does not take into account differing CPU power of different machines (i.e. a job which takes 1 hour on a VAX 4000 will only take 30 minutes on an ALPHA).
Over reliance on the VMS scheduling priority results in low priority jobs languishing without any CPU while tying up valuable resources such as tape drives and memory.
VMS batch does not provide adequate facilities for specifying relative priorities of jobs, e.g. this queue should run user analysis jobs, but if there are no analysis jobs then it should run reconstruction jobs, but if there are no reconstruction jobs it should run MC jobs.

In addition the SLD batch system provides some improved tools for monitoring and controlling the batch queues. The design of the system is modeled on the hallowed SLACVM batch system (said to have been designed in the good old days of SLAC computing know-how).

Submitting Jobs

Jobs are submitted using the standard VMS SUBMIT command. All jobs should be submitted to one of the following queues:

  Queue            Time Limit                                                 
  ============     ========== 
  SLD_EXPRESS       2 minutes                                                  
  SLD_FAST          8 minutes                                                  
  SLD_SHORT        20 minutes                                                  
  SLD_MEDIUM        1 hour                                                      
  SLD_LONG          3 hours                                                    
  SLD_CRUNCH        Infinite                                                   
  SLD_STAGE         Reserved for production use                                
  SLD_MC            Reserved for production use

A process called JOB_JUGGLER will then move your job from one of these queues to an appropriate execution queue (sending you a message if you specified /NOTIFY on your submit command).

You can control the execution of your job by specifying CHARACTERISTICS when you submit the job. Important characteristics for regular users are:

CART: MUST be specified if your job will access cartridges
VAX: If you would like to prevent your job from running on an ALPHA.

You specify these characteristics as follows:

                                                                               
$ SUBMIT/QUEUE=SLD_LONG/CHAR=VAX                                               
$ SUBMIT/QUEUE=SLD_MEDIUM/CHAR=(VAX,CART)

Other characteristics allow you to control which node your job will run on, e.g.:

ALPHA: Only run on ALPPHA
VAX: Only run on a VAX (ie NOT an ALPHA)
FDDI: Only run on a machine that has FDDI access
JNET: Only run on a machine that has JNET access
SLACVX, SLDA1, SLDB1, SLDB2, SLDB3, SLDB4, SLDB5, SLDB6, NOSLACVX: Force job to run on a specific node.

Monitoring Job Progress

All the normla VMS commands for querying jobs and queues can of course still be used, but an improved command BATQ is also made available as part of the SLD batch system, which provides a relatively susinct summary of all the jobs running or queued in the system, together with how much CPU time each job has so far used. The command can be used as follows:

BATQ: shows all jobs
BATQ/USER=USHER: shows all of Tracy's jobs
BATQ/PENDING: only shows pending jobs
BATQ/IGNORE=SLDPENDING: ignores those zillions of SLDMCM pending jobs
BATQ/CLASS=X: show all jobs in class X

The BATQ command is now implemented as a C program (thanks to David Williams)

Technical Details

Under the SLD batch system the queues which users submit their jobs to are all generic queues which feed into a dummy exectution queue which is never started. In the absense of any outside intervention users' jobs would sit in these queues forever.

The core of the SLD batch system is a single DCL command file (JOB_JUGGLER.COM) which runs as a batch job in each of the execution queues. This implementation method was chosen for speed of implementation and ease of maintainence. The JOB_JUGGLER always has the lowest queuing priority so any other job in the queue will go ahead of it, but if the execution queue has an empty slot the JOB_JUGGLER will start and will then begin searching for eligible jobs to run.

The system is controlled by three sets of configuration files, the first set consists of the execution configuration file EXECUTION.config which list all of the execution queues in which job_juggler is to run, and the generic configuration file GENERIC.config. This lists all of the generic queues that the system is to manage, assigns a one letter job class to each queue, and sets a maximum CPU time that jobs submitted in each queue can use.

Next are the day configuration files:

WEEKDAY.config: used on weekdays
WEEKEND.config: used on weekends
HOLIDAY.config: used on holidays

These configuration files specify which execution configuration file the job_juggler should use at different times. Currently this boils down to one of:

PEAK.config: used during peak times
NON_PEAK.config: used during non-peak times
TRANSITION.config: useed during transition from non-peak to peak

These files list each of the execution queues and specify the order in which job_juggler is to search the job classes (= generic queues) when looking for jobs eligible to run in these queues. In addition these files specify relative CPU powers for each queue (not yet implemented) and optionally a superseded queue (see later).

When searching for eligible jobs job_juggler searches the generic queues in the order specified. To be eligible to be run a job must satisfy the following requirements:

The job must not request any CHARACTERISTICS that the queue does not possess.
The minimum of the generic queue time limit and any time limit specified when the job was submitted must not exceed the time limit for the execution queue multiplied by the relative CPU power for the queue.

If an eligible job is found it is moved into the execution queue, then the job_juggler resubmits itself and terminates. If no eleigible jobs are found the job_juggler waits a certain time (currently four minutes) and then wakes up and looks for more jobs. After 50 unsucessful attempts to find an eligible job the job_juggler resubmits itself and then exits (to avoid problems with infintely large log files or CPU time limits).

One special case concerns the CART charactersistic. If the execution queue the job_juggler is running in has the CART characteristic then job_juggler first checks to see if there are any free tape drives. If not then it proceeds as if the queue did not have the CART characteristic (i.e. it will not accept any jobs which require CART). If there are free tape drives it then checks that both the DCSC and SETUP systems are functioning correctly. If it finds that either system is bust it

Sends a whining messsage (currently to TONYJ and CXGSYS).
Proceeds as if the queue did not have the CART characteristic.

This system is designed to attempt to prevent jobs which need tapes from starting when they obviously cannot run. It is not perfect however, since the job_juggler does not know how many tape drives a job will actually use, nor is there any mechanism to prevent two jobs starting simultaneously when there is only one drive avilable. This is not fatal however since SLD jobs wait if they need resources until these resources become available.

Finally if job_juggler moves a job into an execution queue that has a superceded queue specified then the superceded queue is stopped, resulting in any jobs running in that queue being suspended. The queue is only restarted when there are no longer any jobs running or eligible to be run in the execution queue. This allows (for example) user jobs to completely replace MC jobs at short notice, giving good turnaround for user jobs. (Note that suspended processes are the first processes VMS swaps out of memory if physical memory is short).

Job_juggler keeps a log of all jobs whiuch it moves as job_juggler.log. This log file records the owner, class, submission time and start time of each job. At some point in the future an analysis program may be written to generate statistics from this log.

Tools for system maintainers

There are two tools for system maintainers:

CHECKQ.COM: Produces a listing of each queue on the cluster, showing such things as maximum CPU time allowed, node, whether job_juggler is running etc. for each queue.
SETQ.COM: Scans each queue and sets appropriate characteristics as well as starting job_juggler on any queue it should be running on but is not.

Work that still needs to be done

The relative cpu waitings have to actually be implemented (easy)
A way of enforcing the CART characteristic to be specified for jobs which use tapes would be nice (i.e. a way to kill jobs which attempt to use tapes but which did not specify /CHAR=CART). (done!)
Some sort of anti-flooding mechanism a laVM would be nice to stop Homer from using all the exectution queue. (Experimental system now in place, see Appendix.)
A way of requeuing jobs which need a resource (tape, tape drive etc.) which is currently unavailable would be nice. (hard - needs mods to SETUP)
It would be nice if the BATQ command worked faster (this would probably require rewriting it as a c program instead of a COM file) (done!).

Tony Johnson
February 1994
Updated August 1994