Privacy and Security Notice
E94-010 Replay
E94-010 Replay
E94-010 Replay
by Seonho Choi
Last updated on 11/11/98
0. Setting up HALLA_DIR
To do a replay of the E94010 data, the first thing to do is setting
up HALLA_DIR, in general on /work disk. You can create your own
directory in /work/halla/e94010. And in general, you had better use
your own user name to be identified easily. Once such directory is
created, you can setup HALLA_DIR by typing,
setenv HALLA_DIR /work/halla/e94010/{username}.
1. Submitting jobs to farm machines
Once HALLA_DIR is setup, change to that directory using "cd
$HALLA_DIR". To submit jobs, you need to copy master script from my
own HALLA_DIR. Type,
cp /work/halla/e94010/choi/replay/submit $HALLA_DIR.
Now you can simply submit jobs by typing "submit 1508 1522".
You can give as many run numbers as you want. And if you want to
specify a range of runs, you can do that simply typing,
submit 2001 - 2030
then it will submit 30 jobs from run 2001 to 2030. (What happens if
there is no run corresponding to, say 2015? The script
automatically skips that run number and no job will be
submitted. So in principle, if you want to analyze all the runs
from E94010, you can simply type "submit 0 99999", although it will
try to analyze all the junk runs too.)
2. What's happening after that?
If everything goes well, the job should start on batch farm
machines as soon as one of them is available. (Currently, espace runs
only on Solaris and Linux batch farms.) You can check the status of
your jobs by looking at
http://farms0.jlab.org:9090/FarmStatus?QueryJobStat
with your Netscape.
3. What do you get after all this effort?
Once job is complete (successful or not), you will get several
e-mails for each job. If job was finished successfully, you can
discard these e-mails, but for jobs killed without reason, you
should keep the e-mails and probably send them to me.
Other than these e-mails, five output files are saved, each of them
easily identified by run number.
final ntuple - saved to tape with stub file in
/mss/home/{username}/ntuples/{date}
some histograms - saved to $HALLA_DIR/replay/hbooks
scaler output - saved to $HALLA_DIR/replay/scalers
program output - saved to $HALLA_DIR/replay/batch_result
error output - saved to $HALLA_DIR/replay/batch_result
In case of jobs killed prematurely, you can send me an e-mail with
detailed information so that I can investigate the situation.
4. What can you do with the output files?
You can immediately check your histograms in
$HALLA_DIR/replay/hbooks with PAW. I tried to include almost all
the histograms in Xiaodong's kumac file. If you need to do more
detailed analysis, you need to copy ntuple from the tape to
disk. You can do this by typing
jget stub_file_name_for_your_ntuple complete_path_for_disk_file
Depending on the load, this process may take a whole day.
5. Help
In case of problems, you can contact me via x5007 or choi@jlab.org.
6. Known problems
a) On some farm machines, the copy of raw data from silo to disk is
problematic. If the raw data are not copied properly, gdh_espace
exits immediately.
b) Some data files are corrupted and gdh_espace exits when "corrupt
synch : fatal error in decoding" error occurs.
c) After successful replay of the raw data, ntuples are not copied
properly to silo. This should be a problem of batch manager and
I am still investigating.
d) Without any obvious reason, gdh_espace won't start at all or
dies in the middle of the replay. I have no clue for these
symptoms.
e) The problem related to split files reported early has been fixed
completely.
f) Maybe other unknown problems to be discovered later.
Back to the E94-010 homepage