Workbook for SLD Offline Users - Batch Processing

Purpose of the SLACVX batch system

Up to now, you have been running the SLD offline software in interactive mode. An alternate way of running, called "batch mode," tends to be more appropriate for certain types of work. Batch mode uses all of the same commands as interactive IDA, but all of your IDA commands must be prepared in advance and stored in a file.

In this section of the workbook, you will learn when to use batch mode instead of interactive mode, how to submit batch jobs, how to monitor their progress and how to retrieve their output.

Most of what you will learn in this section applies both to running on the SLACVX cluster and on any other VMS system. However some aspects of how the queues work and how you monitor jobs is different on the SLACVX cluster since the SLACVX cluster contains SLD extensions to the basic VMS batch system.

When to Use Batch versus Interactive

SLD users typically run in interactive mode when they want to study a small number of events in great detail. You should run this way when you are developing your algorithms, trying out event selection criteria or testing a new piece of reconstruction code. But once you have written your code and you want to run that same code on a large number of events, it is time for you to run in batch mode.

Because batch jobs make more efficient use of computer resources such as processor power and tape drives, batch jobs are given higher priority for these resources.

For reasons discussed in the workbook section The Tape System, batch processing is the only way you are allowed to write tapes.

Batch jobs is preferable for complicated jobs because it leaves a more organized record of your work. When it comes time to redo an analysis with minor changes, you will find the rework much easier if the original work was a batch job.

Finally, it is simply impolite to tie up resources in interactive mode that could better be shared with your collaborators by running in batch mode.

Don't be afraid to use interactive mode, but keep to the basic rule:

Writing the COM File

To run a job in batch mode, you put a set of commands into a .COM file and then use the SUBMIT command to send this COM file to the batch system.

When the batch job runs:

An IDA Job to Write to Your Own Tape

Now that you are learning about batch processing, you have the opportunity to write to tapes as well as read from them. The next exercise will teach you how to copy a selected set of events from a standard data set to a tape of your own. This is often the first step of a physics analysis. The selection you will use in this example is part of the process for collecting the suspected Z events from a data set. A full Z selection uses more cut criteria then we will include here and will be done later in the workbook.

If you do not already have a tape of your own, issue a GETFREE command to get one.

Create new subdirectory of your login directory to keep all of the batch files in. This is not absolutely necessary, but it provides a good way of organizing your work.

   SET DEFAULT SYS$LOGIN
   CREATE/DIR [.BATCHTEST]
   SET DEFAULT [.BATCHTEST]

Create a new file to store the commands that your batch job will execute. You can use any file with file extension COM. For this example, make it TESTONE.COM.

Type or paste the following code into TESTONE.COM:

   $SET DEFAULT [.BATCHTEST]
   $DUCS PROD ALL
   $IDA
      OPENTAPE READ REC94_MDST STAGE WAIT
      OPENTAPE WRITE your_tape_id.1/FIVETRK 

      DEF EVANAL
         NTRACKS=0
         BANKLOOP PHCHRG
            IF PHCHRG%(NHIT) > 40
               NTRACKS=NTRACKS+1
            ENDIF
         ENDLOOP

         IF NTRACKS > 5
            WRITE USING INLIST
            TYPE "Wrote Event   " _IEVENTH%(EVENT)
         ELSE
            TYPE "Rejected Event" _IEVENTH%(EVENT)
         ENDIF
      ENDDEF

   GO 200
   QQUIT

With the exception of the dollar signs in the first three lines, this file contains exactly what you would type to run this IDA job interactively from a fresh login. But, as we have discussed, the OPENTAPE WRITE command would be rejected from an interactive job.

In a COM file, every line that is a DCL command is preceeded by a dollar sign. Once the job has started IDA, the commands are IDA commands, not DCL commands, so they do not have dollar signs.

When your job is run, the result will be a new data tape that contains only the events with more than five charged tracks with at least 40 hits each.

How the Different Queues Work

You are almost ready to submit your job, but before you submit it you need to know something about "Batch Queues."

Batch systems typically need to accomodate both very long jobs and very short jobs. Coming up with a system to share limited computer resources between these two different kinds of jobs is difficult. Among the considerations are factors such as that some of the most important jobs tend to be long ones (such as preparing the Monte Carlo data set needed for analysis to be presented at an upcoming conference), but shorter jobs take so little time to run that it doesn't make sense to have them wait until all the long important jobs are finished. There can also be problems with different jobs competing for the same tapes or tape drives.

Batch systems typically handle these problems by designating different "batch queues" for different types of jobs. There are some queues for shorter jobs and some queues for longer jobs. The system then assigns some resources to each queue. If a given queue is empty, the system may try to allocate more resources to other queues (though for complicated but valid reasons, it sometimes makes sense to reserve resources for a queue that happens to be empty).

It is important to submit each job to the appropriate batch queue.

Non-SLACVX System Queues

If you are NOT running on the SLACVX cluster, you will probably be using the standard VMS batch system.

To see what batch queues are available,

type SHOW QUEUE host_name*

Batch queue names will have suffixes like "EXPRESS", "MEDIUM" or "CRUNCH."

Express queues typically handle short jobs. Crunch queues typically handle long jobs.

SLACVX Cluster Queues

When you submit jobs on the SLACVX cluster, you get additional features from the SLD extensions to the VMS batch system. Without the SLD extensions, you would probably first want to check all of the hosts in the SLACVX cluster to decide which one can run your job fastest. But the SLD extensions give you something called the "SLD Job Juggler" which does this checking for you and submits your job to the host that can run your job the soonest. The Job Juggler takes care of a variety of other features to insure that SLD makes efficient use of the available SLACVX cluster resources.

Only the Job Juggler has direct access to the actual SLACVX batch queues. Users submit jobs to the Job Juggler's virtual queues.

The Job Juggler queue names are as follows:

   Queue            Time Limit                                                 
   ============     ========== 
   SLD_EXPRESS       2 minutes                                                  
   SLD_FAST          8 minutes                                                  
   SLD_SHORT        20 minutes                                                  
   SLD_MEDIUM        1 hour                                                     
   SLD_LONG          3 hours                                                    
   SLD_CRUNCH        Infinite                                                   
   SLD_STAGE         Reserved for production use                                
   SLD_MC            Reserved for production use                                

Submitting the Job

Your example batch job will not take very long to run. In you are running on the SLACVX cluster, you should be able to run it in the SLD_EXPRESS queue. If you are running on another VMS host, you should be able to run it in the whatever queue is set up for fast jobs (it probably has "EXPRESS" somewhere it the queue name).

To run the job, use the SUBMIT command.

SLACVX Cluster Running

type SUBMIT/QUEUE=SLD_EXPRESS/CHAR=CART TESTONE.COM

The extra option, CHAR=CART, tells the Job Juggler that it needs to reserve tape cartidge handling resources for your job. Without this option, you job will not be allowed to use tapes.

Two other common options are, CHAR=VAX and CHAR=ALPHA. These options cause your job to be run only on the specified type of host. The SLACVX cluster contains both VAX and ALPHA machines. The code that is in DUCS will run on either type of machine, but when you get to writing your own code, you will want to run it on the same kind of machine that it was compiled on.

To use multiple CHAR options in the same SUBMIT, put them together within parentheses, separated by commas, as in:

SUBMIT/QUEUE=SLD_EXPRESS/CHAR=(CART,VAX) TESTONE.COM

For other SUBMIT options, see The SLACVX Batch System.

Non-SLACVX Cluster Running

type SUBMIT/QUEUE=queue_name TESTONE.COM

The system should respond with a message that your job is pending.

Monitoring the Job via BATQ

SLACVX Cluster Monitoring

you can monitor the progress of your job by using the BATQ command.

type BATQ/USER=your_userid

If you get the response "No jobs found," it means your job has completed.

BATQ with no options shows you all jobs in the batch system.

For other BATQ options, see The SLACVX Batch System.

Non-SLACVX System Monitoring

to monitor the progress of your job, you will need to remember the job's entry number. This number was reported back to you when you did the SUBMIT. If you have forgotten the number, you can find it by checking all of the entries in the given queue.

Type SHOW QUEUE/ALL queue_name

To get full details on your job's progress, once you have the entry number,

type SHOW ENTRY/FULL entry_number

Getting the Output

When your job has completed, a record of the job will be left behind in your login directory under the same name as your .COM file but with the file extension .LOG. Thus for your example job, your log file will be called TESTONE.LOG. It contains everything that would have been typed to your terminal if this had been an interactive session. At the end of the log file, there is a block of information about the amount of system resources that were consumed by your job.

Of particular interest is the "Charged CPU time." This is the time that is continually checked against the limits allowed by the particular batch queue that you are using. The units are days, followed after a space by hours, minutes, seconds and hundredths of a second.

As soon as your job begins running, the log file will appear on disk. You can type out the log file or view it in an editor at any time while the job is running to check on the progress of your job. By checking in this way, you can often detect a problem before the job has finished running and can stop the job if it is not working correctly.

Deleting a Job

To stop a job that is in queue or is running, you first need to get the job's entry number by using BATQ (SLACVX cluster) or SHOW QUEUE (non-SLACVX systems) as described above.

You can then stop the job by typing

DELETE/ENTRY=entry_number

The DELETE command gives no output unless it FAILS to find the job. To confirm that the DELETE has worked, do another BATQ or SHOW QUEUE.

Use DELETE only on jobs you directly submitted. Do not use it to delete staging jobs (jobs automatically submitted by the STAGE command). Doing this will cause the staging system to lose track of what it has staged.

Check Your Results

If your job succeeded, you should now have a data set on your tape that contains only the events with more than five charged tracks with at least 40 hits each. Check your tape now by opening it and writing out the event numbers. You can do this from an interactive job by just using the IDA commands:
   OPENTAPE READ your_tape_id.1/FIVETRK STAGE WAIT

   DEF EVANAL
      TYPE "Tape contains event" _IEVENTH%(EVENT)
   ENDDEF

   GO 0
   CLOSE INFILE

You could try the same thing from a batch job if you prefer.

To complete this exercise, enter the tape in your private Datacat. If you do not have one already follow the instructions on how to make your own private datacat. Then create a nickname for your new data set.

Now go back to IDA and reopen your tape this time using the nickname.

   OPENTAPE READ your_nickname STAGE WAIT

   DEF EVANAL
      TYPE "Tape contains event" _IEVENTH%(EVENT)
   ENDDEF

   GO 0
   CLOSE INFILE

Conclusion and a Reference

You have now created a batch job, run it, monitored its progress and retrieved its output. In the process, you have learned the last part of the tape system, how to write to your own tape. Your future batch jobs may involve more elaborate IDA code and more elaborate DCL commands before the IDA code, but the basic process will be the same.

For further details on running batch on the SLACVX cluster, see The SLACVX Batch System by Tony Johnson.


Back to Workbook Front Page

Joseph Perl
13 November 1996