E94-010 Replay

E94-010 Replay E94-010 Replay
by Seonho Choi
Last updated on 11/11/98


0. Setting up HALLA_DIR

   To do a replay of the E94010 data, the first thing to do is setting
   up HALLA_DIR, in general on /work disk. You can create your own
   directory in /work/halla/e94010. And in general, you had better use
   your own user name to be identified easily. Once such directory is
   created, you can setup HALLA_DIR by typing,
	setenv HALLA_DIR /work/halla/e94010/{username}.

1. Submitting jobs to farm machines

   Once HALLA_DIR is setup, change to that directory using "cd
   $HALLA_DIR". To submit jobs, you need to copy master script from my
   own HALLA_DIR. Type,
	cp /work/halla/e94010/choi/replay/submit $HALLA_DIR.

   Now you can simply submit jobs by typing "submit 1508 1522".
   You can give as many run numbers as you want. And if you want to
   specify a range of runs, you can do that simply typing,
	submit 2001 - 2030
   then it will submit 30 jobs from run 2001 to 2030. (What happens if
   there is no run corresponding to, say 2015? The script
   automatically skips that run number and no job will be
   submitted. So in principle, if you want to analyze all the runs
   from E94010, you can simply type "submit 0 99999", although it will
   try to analyze all the junk runs too.)

2. What's happening after that?

   If everything goes well, the job should start on batch farm
   machines as soon as one of them is available. (Currently, espace runs
   only on Solaris and Linux batch farms.) You can check the status of
   your jobs by looking at 
	http://farms0.jlab.org:9090/FarmStatus?QueryJobStat
   with your Netscape.

3. What do you get after all this effort?

   Once job is complete (successful or not), you will get several
   e-mails for each job. If job was finished successfully, you can
   discard these e-mails, but for jobs killed without reason, you
   should keep the e-mails and probably send them to me.

   Other than these e-mails, five output files are saved, each of them
   easily identified by run number.

	final ntuple    - saved to tape with stub file in
		          /mss/home/{username}/ntuples/{date}
	some histograms - saved to $HALLA_DIR/replay/hbooks
	scaler output   - saved to $HALLA_DIR/replay/scalers
	program output  - saved to $HALLA_DIR/replay/batch_result
	error output    - saved to $HALLA_DIR/replay/batch_result

   In case of jobs killed prematurely, you can send me an e-mail with
   detailed information so that I can investigate the situation.

4. What can you do with the output files?

   You can immediately check your histograms in
   $HALLA_DIR/replay/hbooks with PAW. I tried to include almost all
   the histograms in Xiaodong's kumac file. If you need to do more
   detailed analysis, you need to copy ntuple from the tape to
   disk. You can do this by typing
	jget stub_file_name_for_your_ntuple complete_path_for_disk_file
   Depending on the load, this process may take a whole day.

5. Help

   In case of problems, you can contact me via x5007 or choi@jlab.org.

6. Known problems

   a) On some farm machines, the copy of raw data from silo to disk is
      problematic. If the raw data are not copied properly, gdh_espace
      exits immediately.

   b) Some data files are corrupted and gdh_espace exits when "corrupt
      synch : fatal error in decoding" error occurs.

   c) After successful replay of the raw data, ntuples are not copied
      properly to silo. This should be a problem of batch manager and
      I am still investigating.

   d) Without any obvious reason, gdh_espace won't start at all or
      dies in the middle of the replay. I have no clue for these
      symptoms.

   e) The problem related to split files reported early has been fixed
      completely.

   f) Maybe other unknown problems to be discovered later.
Back to the E94-010 homepage