LPC Computing

The New Tutorial Scripts

Introduction

These scripts are examples of how to run CMS software at FNAL, making use of the Condor batch systems to parallelize the jobs. Please note that this is only a collection of tested examples to show how you might do things yourself and not a robust, automatic production service. In addition to running the examples, we will attempt to give some hints about how to read the log files that are produced so you can check that your jobs ran correctly.

There is one minor computing detail that needs to be explained here. In these notes, we implicitly assume that your login shell is the C shell or one of its look-alikes. Also, we assume that the current working directory ($cwd) may not be in your path. If that is the case, invoking an executable named myexe in $cwd must be done via ./myexe since the file myexe will not be found otherwise.

Note that we will be making some use of the SCRAM facility in what follows. For some SCRAM basics, have a look at a separate note, "Getting Started With SCRAM," here.

Access To The Repository

All the scripts and a few templates for different physics channels are provided in the LPC CVS repository. To check out your copy of the tutorial scripts from CVS, do the following:

For read-only access:

setenv CVSROOT :pserver:anonymous@cdcvs.fnal.gov:/cvs/lpc
cvs login
(password is lpc)
cvs co -r SL3_1_0 newtutorial
cd newtutorial

If you are a developer, have the appropriate privileges and want to check out with write access, do:

 
setenv CVSROOT lpccvs@cdcvs.fnal.gov:/cvs/lpc
cvs co -r SL3_1_0 newtutorial
cd newtutorial

This will check out from the head of the repository which is appropriate for a developer. Furthermore, you can browse the contents of the tutorial on the web via the URL

http://cdcvs.fnal.gov/cgi-bin/public-cvs/cvsweb-public.cgi/newtutorial/?cvsroot=lpc

Generating The Job Submission Scripts

The job submission scripts are all generated by running a perl script named setup.pl, which will accept a number of argument. All of these arguments have reasonable default values, any of which can be overridden using one or more of the options described below. To display all options and see the default values, do:

 
./setup.pl -h

This will produce output similar to:

Syntax setup.pl [ -h
| -s <OSCAR version>
| -f <datacards>
| -r <ORCA version>
| -g <CMKIN version>
| -p <path>
| -e <email>
| -c <copycommand>
| -d <destination path> ]

where

-h: This help message
-f: name of the physics channel. Default is zprime700_mumu
-s: OSCAR version. Defailt is OSCAR_3_6_5
-r: ORCA version. Default is ORCA_8_7_3
-g: CMKIN version. Default is CMKIN_4_3_1
-p: path for tutorial. Default is /uscms_data/d1/$LOGNAME/Tutorial
-e: email. Default is $LOGNAME@fnal.gov
-c: copy command. Default is /bin/cp
-d: destination path for data. Default is /uscms_data/d1/$LOGNAME/Tutorial/data

Beware! While you can change the versions of ORCA, OSCAR and CMKIN, it is not guaranteed that all possible version combinations will work. Currently, setup.pl does not even check that the selected version exists and the data cards are not modified to ensure that they are appropriate for the selected versions. In the future, we might add checks for this and modify the data cards correspondingly but this is not implemented yet. In short, you should not depart from the defaults unless you are quite sure of what you are doing and are willing to make the necessary modifications yourself. In the future, we may indicate consistent CMKIN, OSCAR and ORCA releases with an overall CVS tag but this is not now being done.

If you are happy with the defaults, simply type:

./setup.pl

A more interesting possibility is to select a nondefault physics process such as:

 
./setup.pl -f h300eemm

In either case, this will create all the necessary scripts to run the selected analysis and physics process through the entire processing chain of

  • event generation with CMKIN
  • simulation of the CMS detector with OSCAR
  • digitization of the simulated hits with ORCA
  • run the reconstruction and produce the DST (optional)
  • produce a Root Tree for analysis: ExRootanalysis

None of the default steps requires the user to recompile any code. Only those executables which are provided with the software distribution are used. (In what follows, we assume you have chosen the default physics channel. The generalization to another channel is straightforward.)

Examine The Created Scripts

One of the scripts that is created by setup.pl is setup_tut.csh. Execute the command

 
source ./setup_tut.csh

to define a few environment variables that will make it easy to navigate between the different relevant directories:

  • $TUTORIAL: The directory where you checked out the tutorial
  • $TOP: The top directory of the destination (scripts, data and so on)
  • $PNFS_PATH: The directory where the data ends up
  • $ANALYSIS: A directory with various analysis scripts to examine the output using the Root system
  • $SCRIPTS: The directory where the submission scripts for condor are found

To have a look at the various job submission scripts, do:

 
source ./setup_tut.csh
cd $SCRIPTS
ls

and examine any of the scripts found there.

In addition, the subdirectory templates/datacards shows the currently available physics channels.

Along with these job submission scripts, you should find two other somewhat more special purpose utility scripts, one of which we describe briefly here because you will need it to do analysis.

The CMS mass storage system is tuned to operate efficiently with a reasonable number of rather large files. That efficiency degrades rapidly and badly when the system is confronted with an unreasonably large number of smaller files. (In this context, large begins with something of the order of a Gigabyte.) To avoid this problem, the scripts that setup.pl generates gather up all of the files that each corresponding application writes into a single compressed tar file. That tar file is still not very large by our standards but, at least, there is only one file instead of 20 or so. However, a tar file is useless for analysis so that the file must be unwound again. That is the job of the first of the utility scripts which is named prepare.csh. The name is intended to suggest that it helps you prepare to do analysis.

Submit a Few CMKIN Jobs To The Batch Queue

Here, we run a Monte Carlo event generator.

For zprime(700) -> mu mu events

 
cd $SCRIPTS
./zprime700_mumu_cmkin_condor.csh 10 1000 100

would submit 10 jobs to the condor queue. Each job (also known as a process) processes a different incremental run number with the first starting at number 1000. Each process generates 100 events. The random number seed is different for each of the processes. In either case, this job should finish fairly quickly - depending on how busy the batch systems are. Once the processes are done, you should receive an email message from the batch system. For details on how to interact with both batch systems, in particular how to check on the status of a job, see Batch Systems.

Once the jobs are finished, you can see the output data files with

 
ls $PNFS_PATH/zprime700_mumu

and its subdirectories, or

 
ls $PNFS_PATH/h300eemm

and its subdirectories, depending on the physics channel you selected.

Running the OSCAR Detector Simulation

In this step, we use the output of the Monte Carlo generator created in the previous step as an input to the GEANT4-based CMS detector simulation called OSCAR. Since OSCAR takes quite a while for each event, one should try with a few events first.

 
cd $SCRIPTS
./zprime700_mumu_oscar_condor.csh 10 1000 10

would submit 10 jobs to the condor queue. Each job (process) processes a different incremental run number, the first starting at run number 1000 (and span run numbers 1000 through 1009). Each process runs the simulation for the first 10 generated events.

All this will take a while and you can monitor the progress of your jobs with the command

 
condor_q

as described in Batch Systems.

Running ORCA To Digitize The Hits Created By OSCAR

In this step, we use the output of the OSCAR detector simulation created in the previous step as an input to the CMS reconstruction framework, ORCA. The first step in reconstructing the data is digitization which simulates how the detector and electronics react to the input. To accomplish that, do:

 
cd $SCRIPTS
./zprime700_mumu_digis_condor.csh 10 1000 10

for condor.

Again, you can monitor the status of your jobs as described earlier.

Running ORCA To Reconstruct From The Digis

This processing step uses the digis created in the previous step for its input. Here the event is fully reconstructed, resulting in physics objects such as tracks, vertices, jets and so on. Finally, these objects are used to populate a Root Tree that can be analyzed in a separate Root session. Proceed as follows;

 
cd $SCRIPTS
./zprime700_mumu_exrootdigis_condor.csh 10 1000 10

for condor.

On completion of this job, the directories $PNFS_PATH/zprime700_mumu/NNN/exroot_digis (NNN running from 1000 through 1009) should contain files digis_zprime700_mumu_NNN.root, (again with NNN running from 1000 through 1009) with 10 events each.

More On Utility Scripts

Earlier we described the workings of the utility script prepare.csh and deferred discussion of the other utility script, save_to_dcache.csh since it was not needed for this tutorial. That is because we wrote all output files locally.

Something that we need to do fairly often - especially when files get rather large and need to be stored safely for some time - is to move data files from local storage into the mass storage system, dcache. The utility script save_to_dcache.csh does that. Note that this utility takes a pair of directory names as arguments. These are the name of the directory where the original files are and the name of the mass storage directory that should receive the file copies respectively. The script copies all files from the named "source" directory to the named "target" directory so you should do whatever source directory cleanup is necessary before doing the copy. Since prepare.csh unwinds tar files into the same directory as the tar file itself, cleaning up before copying to dcache is especially important.

Running Root To See Some Physics Results

The $ANALYSIS area contains some example Root scripts to analyse the data at various steps coming out of the tutorial. Note that there is a text file named README in the $ANALYSIS directory with all of the same information as below.

For now, we assume you selected the default channel, zprime700_mumu. First make sure you did:

source setup_tut.csh

Then go to the data area of this channel:

cd $PNFS_PATH/zprime700_mumu/rootfiles/

Setup the runtime enviroment so that the Root executable root.exe is on the path with

source $SCRIPTS/setcms.csh

Run Root to create the class templates to analyze the Root trees in that area:

root.exe

and, after getting the Root prompt, enter the following commands:

TChain h101("h101");
h101.Add("zprime700*.root");
h101.MakeClass("Zpmumu");
.q

Root will manufacture two files, Zpmumu.h and Zpmumu.C which, together define a class "template" to manipulate the zprime700_mumu data. In particular, this class contains a member function named Zpmumu::Loop which includes all the necessary machinery to loop over the events and present them one by one for analysis. As generated, this function does no analysis. You need to tailor it by declaring your own histograms and inserting analysis code to select events and fill those histograms yourself. Rather than take the time to do all that now, use the already completed version of Zpmumu.C that you checked out of the CVS repository. Copy over the example Root scripts, start Root and execute the scripts:

rm Zpmumu.C
cp ${ANALYSIS}/Zpmumu.C .
cp ${ANALYSIS}/plot_Zpmumu.C .

root.exe
gSystem->Load("libPhysics");
.L Zpmumu.C++
.x plot_Zpmumu.C++

When you are all done, Root should present you with a set of histograms very similar to the following one.

If you had chosen to look at the zprime700_ee channel instead, Root would present you with a set of histograms very similar to the following one.

The procedure for the channel h300eemm is analogous. First make sure you did:

source setup_tut.csh

Then go to the data area of this channel:

cd $PNFS_PATH/h300eemm/rootfiles/

Setup the runtime environment so that the root.exe is on the path with

source $SCRIPTS/setcms.csh

Run Root to create the class templates to analyze the Root trees in that area:

root.exe 

TChain h101("h101");
h101.Add("h300eemm*.root");
h101.MakeClass("Higgs");
.q

Copy over the example Root scripts, start Root and execute the scripts:

rm Higgs.C
cp ${ANALYSIS}/Higgs.C .
cp ${ANALYSIS}/plot_Higgs.C .

root.exe
gSystem->Load("libPhysics");
.L Higgs.C++
.x plot_Higgs.C++

When you are all done, Root should present you with a set of histogram very similar to the following one.

For more detailed analysis on these data, see the separate note, "Using ExRootAnalysis For TTree Analysis," here.


Last updated: October 18, 2005.
This document is maintained by Hans Wenzel (wenzel@fnal.gov), Patrick Gartung (gartung@fnal.gov) and John Marraffino (marafino@fnal.gov)
Webmaster | Last modified: Wednesday, 09-Jan-2008 14:26:52 CST