## The Scalable Readout System (SRS) based DAQ for the PRad GEMs

K. Gnanvo, N. Liyanage, S. Boyarinov, B. Moffit, B. Raydo, K. Adhikari and D. Dutta for the **The PRad Collaboration** 

## Contents

| 1 | The APV25-SRS Readout                      | <b>2</b> |
|---|--------------------------------------------|----------|
|   | 1.1 Overview                               | 2        |
|   | 1.2 The PRad Readout Electronics           | 2        |
| 2 | The Upgrade of APV-SRS system for PRad     | 4        |
|   | 2.1 Implementing 10 Gbit/sec Ethernet Link | 4        |
|   | 2.2 Rate tests                             | 4        |
|   | 2.3 Buffering                              | 5        |
| 3 | Integration of SRS into CODA               | <b>5</b> |
|   | 3.1 Ongoing Efforts                        | 7        |
| 4 | Summary                                    | 7        |

## 1 The APV25-SRS Readout

### 1.1 Overview

To read out signals from the PRad GEM detector, we will use the APV25 chip based Scalable Readout System (SRS) developed at CERN by the RD51 [1] collaboration. The APV25 chip is an analog chip developed by the Imperial College London for the CMS experiment silicon trackers. It has been subsequently adopted by several experiments, such as the COMPASS trackers at CERN, STAR FGT at BNL and others. It is also planned for the tracking detectors in the SBS project. The APV25 chip samples 128 channels in parallel at 20 MHz or 40 MHz and stores 192 analog samples, each covering 50 ns or 25 ns, per channel. Following a trigger, up to 30 consecutive samples from the buffer are read-out and transmitted to an ADC unit that de-multiplexes the data from the 128 channels and digitizes the analog information. Operating in the 20 MHz mode with the 30 sample readout will give a dynamic time.

The SRS system consists of the following components:

- SRS-APV25 hybrid cards mounted on the detector. These cards contain the 128 channel APV25 chip which reads data from the detector, multiplexes the data, and transmits analog to the ADC card via standard commercial HDMI cables.
- SRS-ADC unit that houses the ADC chips that de-multiplex data and convert into digital format.
- SRS-FEC card which handles the clock and trigger synchronization. A single Front End Card (FEC) and ADC card combination has the capability to read data from up to 16 APV hybrid cards. The data from the FEC are send either directly to the data acquisition computer (DAQ PC) or to the SRS-SRU via a 1 Gb Ethernet copper link.
- SRS-SRU, Scalable Readout Unit, handles communication between multiple (up to 40) FEC cards and the data acquisition computer. It also distributes the clock and trigger synchronization to the FEC cards.
- The data acquisition computer, which could be part of a larger DAQ system as one of the readout controllers.

A schematic view of the system is shown in Fig. 1.

### 1.2 The PRad Readout Electronics

A total of 9216 electronics channels are needed to readout the PRad GEM chambers. This amounts to 72 SRS-APV25 cards (128 channels per card). The data acquisition speed for a APV25 chip is limited by the fact that the outputs from 128 channels is multiplexed into a single serial line. With the 3 APV25 samples readout mode, planned for PRad GEMs configuration, the time to get the data out of the APV25 chip is of the order of 10  $\mu$ s, which gives an intrinsic limiting rate of the order of 100 kHz. However, the actual



Figure 1: A Schematic of the SRS Readout and its components for the PRad chambers.

rate limitation for PRad readout comes from the data transmission links downstream of the APV25 chip. Tests with the SRS indicated the data volume for all 128 channels of one SRS-APV25 card reading 3 samples is approximately 1.2 kB per event for 100% occupancy (no zero suppression). Thus at 5 kHz trigger rate, data rate per card would be 6 MB/s. The SRS-ADC / SRS-FEC card can handle up to 16 SRS-APV25 cards and send data to the SRS-SRU through a 1Gb (125 MB/s) Ethernet link. However, in the current implementation of the DTCC link [2], the speed line of the FEC—SRU transfer is limited to 640 Mbps (800Mbps line speed  $\times$  80% for 8b10b line encoding overhead). With only 640Mbps available (80 MB/s), we plan to use 6 SRS-ADC/SRS-FEC cards to read out all 72 SRS-APV25 cards limiting the number of SRS-APV25 card per SRS-FEC to 12 for a total of 72 MB/s (12 x 6 MB/s) SRS-FEC transfer rate. The SRS-FECs cards will be connected to 2 SRS-SRU boards (3 SRS-FECs per SRS-SRU). Pictures of the different components of the system that will be used to read out the PRad GEM chambers are shown in Fig. 1.

### 2 The Upgrade of APV-SRS system for PRad

The original SRU firmware was designed to consolidate all the data from the Front End Card (FEC) and transfer the data to the DAQ system via a 1 Gbit/sec standard Ethernet (copper) link. This was seen as one of the bottlenecks to a faster data transfer. One of the strategies for increasing the DAQ rate was to upgrade the SRU firmware to enable 10 Gbit/sec data transfer via a optical fiber link. With the 10 Gbit/sec link, the data from the 3 FEC cards will be sent to the DAQ PC at a rate of about 2 Gbps.

#### 2.1 Implementing 10 Gbit/sec Ethernet Link

The SRU firmware was upgraded to use the 10 Gbit/sec Ethernet. The 1 GbE core was replaced with a 10 GbE core and data paths were widened and sped-up in the firmware to take advantage of the upgraded link speed.

# 1/10GbE SRU Data Rate vs Trigger Rate (3 FEC, 12 APV per FEC, 3 TS)

Rate tests

2.2



Figure 2: Test of data transfer rates with the old 1 Gbit/sec capable SRU firmware (blue) and the new upgraded 10 Gbit/sec capable firmware (red).

The old as well as the new upgraded firmware were tested to demonstrate the bottleneck for the 1 Gbit/sec firmware and the removal of the said bottleneck for the 10 Gbit/sec firmware. A SRU setup with 3 FEC cards and 12 APV cards per FEC at the rate of 3 samples per channel of each APV card was tested with both firmware. These conditions mimic the expected conditions of the PRad experiment. The results are shown in Fig. 2, the blue points are for 1 Gbit/sec firmware and the red points are for the 10 Gbit/sec firmware. The tests indicate that the 1 Gbit/sec firmware saturates at 3 kHz demonstrating the bottleneck. The results with the upgraded firmware indicates that rates significantly higher than the estimated PRad DAQ rate of 5 kHz or higher can be readily sustained with the 10 Gbit/sec transfer rate.

### 2.3 Buffering

Additional improvements to firmware such as buffering the triggers on the FEC have also been implemented. Buffering resulted in very significant reduction in the deadtime to < 15% at the desired 5 kHz rate (see Fig. 4).

## 3 Integration of SRS into CODA

The integration of SRS readout and data transfer into CODA has the option of using either a PCI Express (PCIe) based Trigger Interface (TI) or a VME based TI. Libraries and CODA readout lists to read the SRS data into CODA have been developed for both TI options.

The PCIe TI is a new device used to integrate a standard desktop or server PC (with a PCIe BUS) with the standard CODA [3]. It is compatible with the newer CODA systems utilizing the pipelined Trigger Supervisor [4] that will be used for CLAS12. It may also be used in a stand-alone configuration, suitable for smaller CODA systems and bench testing.



Figure 3: The outside (left) and inside (right) view of the test setup. The outside view shows the PCIe TI with the blue fiber connection to the TS and the twisted pair connections for triggers. The inside view shows the FPGA (with the blue heat sink). The 10 Gb card for data transfer can be seen in both pictures with one of the links connected to the SRU.

A server using the PCIe TI is an optimal choice because of its potential for using multiple parallel threads to perform the software pedestal suppression. It also provides the option to support 10 Gb cards and/or multiple 1 Gb cards (in case of SRU failure).

A stable version of the software and kernel driver for the PCIe TI has been released and provided to Sergey Boyarinov for integration into the CLAS12 DAQ. The original software for configuration and readout of the FEC was written to interface with a small program from the DATE [5] software from CERN. System calls were made to this program with the name of a configuration file as its only argument. The configuration file specified which IP address and port to communicate with the FEC, then which values to write to specific registers. To communicate with several FEC, several configuration files were required.

The new software, written by the JLAB DAQ group, is written in C and provides access to the FEC without the use of system calls, speeding up this operation. Subroutines were written to provide the means to iterate over initialized FEC and read them out in the same manner. Fig. 3 shows photographs of the setup used to integrate and test SRS with CODA.



Figure 4: The observed readout rate for 12 (red) and 9 (blue) channels per FEC as a function of the pseudo-random input trigger rate for a 1 FEC card configuration. The tests were performed with (solid) and without (dashed) buffering.

Tests, using the software without buffering, on a configuration using 3 FEC cards that mimic the PRad DAQ setup, have shown that the front-end deadtime of the FEC is less than ~180  $\mu$ sec when reading 12 channels per FEC (i.e 12 APV cards per FEC). This allows for a fixed rate trigger of greater than 5 kHz. Using a pseudo-random pulser at a rate of 15 kHz, the accepted trigger (L1A) rate is found to be about 4.2 kHz using the 10 Gb transfer from SRU to the DAQ computer. When the number of APVs is reduced to 9 per FEC, the front-end deadtime reduces to 133  $\mu$ sec which would allow for even higher rates. The PRad experiment will use 4 FEC cards configured to read 9 APVs per FEC. Later these tests were repeated after the buffering was implemented. With buffering the front end deadtime was reduced to 10  $\mu$ sec.

The results of the test with just 1 FEC card are shown in Fig. 4. The results without buffering (dashed lines) when reading 12 channels (APVs) per FEC are shown in red and 9 channels per FEC are shown in blue. These readout rates for data from the SRU are

consistent with a front end dead-time of 180  $\mu$ sec for 12 channels per FEC and 133 $\mu$ sec for 9 channels per FEC. The results with buffering (solid lines) are also consistent with the 10 $\mu$ sec front end deadtime. With buffering we were able to achieve deadtimes < 15% at the desired 5 kHz trigger rate.

### 3.1 Ongoing Efforts

Zero suppression (sparcification) at the firmware or software level has not been implemented in the current setup. Firmware based zero suppression is not planned at this time. However, a software algorithm for zero suppression and associated code developed for the SRS already exists. This was developed by Kondo Gnanvo of UVa and will be ported into the PRad DAQ. It will help reduce the amount of data written to disk. This work is expected to be completed in the next few weeks.

## 4 Summary

We have demonstrated the capability to read out the PRad GEM detectors at 5 kHz rate with < 15% deadtime at 5 kHz trigger rate. These results were obtained without zero suppression. We are continuing to improve the DAQ system by including zero suppression. We are on track to having a robust well tested DAQ system, well before the May run.

## References

- [1] RD51 Collaboration, http://rd51-public.web.cern.ch/rd51-public/
- [2] A. Tarazona et al., JINST 9 T06004 (20014).
- [3] Description and Technical Information for the PCI-express Trigger Interface (TIpcie) Module, J. William Gu, Nov. 15, 2015.
  https://coda.jlab.org/drupal/system/files/pdfs/HardwareManual/TIpcie/TIpcie.pdf
- [4] Description and Technical Information for Version 4 Trigger Supervisor (TS) Module, J. William Gu Apr. 13, 2015. https://coda.jlab.org/drupal/system/files/pdfs/HardwareManual/TS/TS.pdf
- [5] ALICE DAQ and ECS Manual, December 2010, ALICE-INT-2010-001. https://ph-dep-aid.web.cern.ch/ph-dep-aid/