Privacy and Security Notice

Archived Messages for CLAS_SLOW_CONTROL_1996@cebaf.gov: LeCroy report on HV (detailed, non-experts can ignore)

LeCroy report on HV (detailed, non-experts can ignore)

Mark M. Ito (marki@CEBAF.GOV)
Wed, 17 Apr 1996 17:27:18 -0400 (EDT)

Hi Mark,

The following is my report on our most recent visit to CEBAF.

The goal of our most recent visit to CEBAF (4/9-4/11) was to diagnose a 1458
ARCNET problem reported by CEBAF Hall B personel. This problem is
characterized by a loss of ARCNET communication with an HV mainframe which
can only be reset by cycling the mainframe power. The loss of ARCNET
communication (referred to as a "LAC" latter) has been observed to occur
with a frequency of anywhere from 2 hours to 2 days. During our visit, we
were typically able to "cause" a LAC after about 2 hours of operation. In
order to increase ARCNET activity, we changed the measured current deadzones
to zero. This seemed to increase the likelihood of a LAC.

As part of our debugging, we used new mainframe firmware to display
additional ARCNET hardware and software status information. Also, we
observed the interrupt line (IRQ5) from the ARCNET interface card on a
portable DSO. During one LAC, the new status information indicated that the
mainframe was waiting to receive ARCNET packets; while a during a second
LAC, the mainframe was waiting to transmit. Also during a LAC, the IRQ5
line was high indicating that the mainframe had somehow missed an interface
card interrupt. (The interrupt service chip on the mainframe motherboard is
configured as edge-sensitive by DOS.)

After examination of Hall B's VME ARCNET controller SW driver and DSO
interrupt signal traces, it was concluded that it was likely that
recieve/transmit packets were arriving/leaving in bursts. These bursts
could cause a second interrupt by the ARCNET interface (holding the IRQ5
line high) while the mainframe was still inside it's interrupt service
routine (ISR). The bursts of ARCNET activity are most likely due to the VME
controller CPU servicing EPIC's network messages while suspending ARCNET
communications. (Testing at LeCroy of the mainframes ARCNET communication
does not generate such bursts of send/recieve packets, since the test
systems are only doing one task.)

To address a possible second "prompt" interrupt, a second check of the
ARCNET interface's interrupt status register was added to end of the ISR.
(This extra status check allows for the interrupt service procedure to be
executed again if the new status indicates a pending interrupt.) The 1458
control program was recompiled on a PC portable on-site and copied to a
floppy disk. An HV mainframe was configured to allow for it's control
program to begin execution from an installed floppy disk. With the
mainframe running with the new control program, further ARCNET testing was
done. The LAC problem did NOT occur for the rest of day nor during a
overnight run. As in past, we had configured the measured current deadzones
to zero to increase ARCNET activity.

Since our return to LeCroy, we have made additional (control program) ISR
modifications to assure that a second "prompt" ARCNET packet interrupt will
not be missed. This new version of mainframe FW will be installed on our
next visit. We will need to do further testing at CEBAF to assure the LAC
problem is gone and not possibly related to an apparent DC power problem
discussed below.

While pursuing LAC problems, Hall B personel pointed out two mainframes (of
the eight installed) with a front panel switch problem (FPSWP) wherein a
powered-up mainframe could not be turned off with via its front panel
switch. This problem was found to be related to a 486 Motherboard which
when moved to a working mainframe caused the same FPSWP (in this otherwise
working mainframe). Both mainframes with the FPSWP were 486 based (in fact
the other 6 mainframes are all 386 based). We took one of the 486
motherboards back to LeCroy for testing. It worked fine in our test system.
We have since concluded that the FPSWP is some sort of DC power problem.
All mainframes at CEBAF have 16 modules while the LeCroy engineering test
system has 3. Indeed, we have since asked that a few modules be removed
from the remaining 486 based mainframe at CEBAF and Hall B personel have
observed the FPSWP to disappear.

We are currently debugging the FPSWP with a 486 based mainframe and 16
modules. Since 16 module testing has been done with 486 based mainframes at
LeCroy in the past (as part of product development), we can only conclude
that some power supply, module, motherboard, etc. is out of specification.
Since the other 386-based mainframes (which don't exhibit the FPSWP) in HALL
B at CEBAF typically have 16 modules, this would at least seem to indicate
some new problem in later production versions of the 1458 which happen to
have 486 motherboards.

Late News: We have been able to reproduce the FPSW problem on a new
(486-based) mainframe at LeCroy. Stay tuned for further updates.

Mike Green
Mike_G@pisun.lecroy.com
LRS Engineering
LeCroy Corporation (LCRY)