Gen (E93-026) Batch Replay 98/11/20 Glen A. Warren ~cdaq/gen/Documents/batch_replay.txt There are two automated ways to analyze a large number of runs, namely using the silo and using the farm. The farm is preferred as it allows for parallel processing on a much larger scaler. However, if the batch farm is full, one can run the silo jobs. All files can be found in /net/fs1/home/cdaq/gen/replay unless stated otherwise. Credit goes to Martin Kaufmann for writting most of the scripts involved. SILO The silo processing consists of two processes controlled by perl scripts: silo_new.pl and silo_replay_new.pl. The silo_new.pl script request files from the silo and hopefully stays ahead of the replay. The silo_replay_new.pl script checks for runs on the cache that are not being analyzed by other processes (namely, other processes running the same script) and replays them. The silo_replay_new.pl script can be run from several machines so that several runs can be analyzed at once. The silo_new.pl is dumb. It assumes that the silo is always working and that once a request is made the file will eventually appear on the cache disks. If a request is made and the file never appears on the cache, then the silo_replay_new.pl scripts sits waiting for a run to appear on the cache while the silo_new.pl sits waiting for a run to be analyzed before it requests a new file. This should be changed. If left in its original condition, I would advise checking on the silo_new.pl several times a day. The silo_new.pl script reads a list of runs from the file $list_in. It keeps track of the requested files in $list_out. It checks for replayed runs listed in the file $replayed. These files are all in /net/fs5/work5/cdaq/silo_replay. Before running silo_new.pl, do a "setup mss" => can only run silo_new.pl from jlab or ifarm machines. Current definitions of variables in silo_new.pl: $list_in = "/net/fs5/work5/cdaq/silo_replay/runlist_q1_new.out"; $list_out = "/net/fs5/work5/cdaq/silo_replay/requested_runs_new.dat"; $replayed = "/net/fs5/work5/cdaq/silo_replay/replayed_runs_new.dat"; The silo_replay_new.pl scripts reads a list of runs from the file $list. It checks if the file has been "processed" (meaning that another job is already analyzing or waiting to analyze the run) by checking if the file is in $processed file. Once the run is replayed, the run number is recorded in the $replayed file. These files are all in /net/fs5/work5/cdaq/silo_replay. Because of the distinction of "processed" runs, several silo_replay_new.pl scripts can run simultaneously. Current definitions of variables in silo_replay_new.pl: $list = "/net/fs5/work5/cdaq/silo_replay/requested_runs_q05.dat"; $processed = "/net/fs5/work5/cdaq/silo_replay/processed_runs_q05.dat"; $replayed = "/net/fs5/work5/cdaq/silo_replay/replayed_runs_q05.dat"; To analyze the run, the silo_replay_new.pl script calls the shell scripts silo_replay1 and silo_replay2 which replay the two parts of a run if applicable. These two scripts call different "REPLAY.PARM" files to distinguish the output of the two parts of the run, namely "_1" and "_2". To use the silo scripts, first run silo_new.pl. Then wait for 5 or so files to appear on the cache. At that point, begin to run the silo_replay_new.pl script. FARM The submission of jobs to the farm is handled by one script, farm.pl The farm.pl script reads a list of runs from $list (currently defined as /home/cdaq/gen/book/runlist/runlist_q05.out), and runs the jsubmit.script script for each good run. jsubmit.script defines the necessary shell variables and submits the run to the farm. The script to be run by the farm is farm_replay.pl. This script analyzes the run, both parts if necessary and gzips the ntuples. It also modifies the REPLAY_FARM_PART#.PARM files to use the local scratch disk ($WORKDIR) of the farm machine. Currently the system is set to write the ntuples to the cdaq directory on the work disk. If there are 10 farm jobs writing simultaneously to this disk, the jobs can be significantly slowed down by network traffic. I suggest the the ntuples be written to the local scratch disk first and then copied over to the work disk (or some suitable storage disk). This would involve a minor change in the REPLAY_FARM_PART#.PARM files, I hope. I do not have a lot of experience running this script. I did it once. Unless the farm is booked solid, I advise that you use the farm over the silo replay. The farm.pl script must be run from a jlab machine.