Workbook for SLD Offline Users - DATA

Like most large experiments, the SLD Data is available in many forms. Understanding where to find the data you need, the form it is in, and the method of accessing it, is essential to doing physics at SLD. The following chapter is designed to give you an overview and working knowledge of the SLD data stream.


Banks: the Basic Units of the Data Structure

Jazelle is a data management package designed to provide facilities for data structure manipulation. The basic element from which Jazelle data structures are built is called a "bank." The structure for a Jazelle bank is defined by a "Template." The banks may contain any combination of variable data types such as Integer, Real, Strings etc.

Jazelle banks are grouped into "families," each with a unique "family name." Each family has its own template which defines the structure of the banks belonging to it. Thus all banks belonging to a family have the same structure. Each family of banks contains information specific to a particular aspect of the event. For instance the PHCHRG banks store all the tracking information, thus there is one PHCHRG bank per track, and the entire set of tracking (PHCHRG) banks form the Family PHCHRG. Each bank in a given family has its own bank id, an integer between 0-65535. In order to Uniquely identify a bank one must specify the family name and the bank id.

In order to fully characterize an event one needs several different families of banks to describe all the aspects. Furthermore the families associated with Raw Data differ from those associated with Reconstructed data. In order to get the proper groups of families together the banks are organized into CONTEXTS. So all banks containing Raw Data belong to the CONTEXT "RAWDATA" and all the Parameter banks belong to the CONTEXT "PARAMETERS". Each Family of banks belong to some CONTEXT. We will see how this comes into play later.

At this point it is easier to learn from example!

Data Exercise 1: Using Data Banks

Datacats: Finding the Data You Need

SLD Data may appear on tape, on disk or both. To provide easy access for the user, SLD keeps catalogs of all data in the form of "datacats." Information for each type of data (Raw, Reconstructed, Polarimetry, etc.) is kept in its own datacat. The datacats our further separated into "run years" in order to avoid confusion. In the datacats we can find information about particular runs such as where the data is stored and how many events are in them.

Data Exercise 2: Exploring Datacats

The Data Stream: Raw, DST, Mini-DST and Filtered

The event data from any given run can be found in three basic forms: Raw Data, DST data and Mini-DST data. The Raw data consists of banks filled with quantities read out from the Detector. The Raw data is then further processed by filtering software that separates the "Good" or "Useful" data from the junk (such as events produced by cosmic rays or by beam-gas interactions). The selected events are then processed by reconstruction software and the results are stored as DST data and Mini-DST data.

Raw Data

The original data tapes which are written out directly from the detector's online electronics are called the "Acquisition" data tapes. These tapes are listed in the ACQ.DATACAT. Users are not supposed to read these tapes directly. Instead, an computer process called ACQCOPY makes exact copies of these tapes onto a second set of tapes called the "Raw" tapes. This process runs automatically as soon as a run is completed (a run being about four hours of data taking).

The ACQ tapes are then stored and never read again. They form a backup that protects the basic integrity of SLD's data. They are only read if a problem develops with the copy. It is the copies, the "Raw" tapes, that users are encouraged to use. One may see a listing of the Raw tapes in the RAW.DATACAT.

The Event Filters: Pass1 and Pass2

To cut the Raw data down to a manageable size, a process called the Pass1 Filter is run on all raw data. It is the intention that the data that fails the filter need NEVER be looked at again. It also separates the data off into several streams for the subgroups that need them: Physics and physics calibrations are streamed off into: The output Tapes can be found in the HAD.DATACAT.

A process called the Pass2 Filter is then run on all of the physics events that passed the Pass1 filter. This second filter tags WABs, hadrons, taus and mu-pairs. The filter results are written into the SFCLASS bank (variable evtclass). The reconstructed events are written to tape and can be found in the RECON.DATACAT. Note that these events are still not considered "Physics Ready" as the final constants banks (drift chamber constants, etc) have not yet been computed.

Physics-Ready Data: DST and Mini-DST

The fully Reconstructed data with constants appears in two forms referred to as DST and Mini-DST. The difference between the two is the amount of information stored per event, that is, the number of banks.

The DST

The first form developed at SLD to store reconstructed event data is referred to as the "DST" form, where DST stands for Data Summary Tapes (the term is used in many Particle Physics experiments). The DST form contains reconstructed tracking, calorimetry and vertexing information. Everything you need to do serious physics analysis at SLD is contained in the banks of the DST.

The Mini-DST

As the volume of data collected by SLD grew larger, the reconstructed data files became too large to conveniently keep on disk and too large to quickly read. People at SLD began to ask whether a more compact sort of DST could be devised. It was determined that much of the data in the old DST was needed only by a few people and was not needed by most Physics analyses. A more compact form of DST was created, which contains only the most commonly used quantities. This compact form is called the "Mini-DST."

The Mini-DST is designed to allow the entire set of SLD Physics events to be kept on disk at the same time. This allows multiple users to work with the same data set simultaneously and eliminates the time required for mounting and reading tapes. Users are strongly urged to use the Mini-DST as opposed to the DST whenever possible. Furthermore, users are encouraged to create there own Micro-DST since even the mini-DST has more information than most will need.

Some Useful Web Pages: Pointers to Recent Recon Data Sets

All of the existing Data and Monte Carlo are periodically re-reconstructed. This generally occurs following a set of major improvements to the reconstruction code or the Monte Carlo code. This could involve improvements to tracking or cluster finding, new and better alignment data, improved Monte Carlo models, or other sorts of bug repairs (or bug creations).

At the time of this writing, the most recent reconstruction is the one called "Recon 11." Some Web pages are available to help you locate this data.


Back to Workbook Front Page

Eric Weiss
9 May 1995