JLAB HPC Clusters

The HPC group is working with JLab's Theory group to deploy a sequence of high performance cluster prototypes in support of Lattice Quantum ChromoDynamics (LQCD). LQCD is the numerical approach to solving QCD, the fundamental theory of quarks and gluons and their interactions. This computationally demanding suite of applications are key to understanding the experimental program of the laboratory.

Three clusters are currently deployed, leading up to much more significant resources in the coming years:
386 Node AMD InfiniBand Cluster The latest cluster prototype consists of 396 nodes of AMD Opteron (quad-core) CPUs connected via DDR InifiniBand switched networks. Each node has two processors running at 1.9 GHz, 4 GB DDR2 memory and 80 GB SATA disk. In addition, each node has a PCI-EXPRESS (16x) slot for an InfiniBand HCA adapter that provides 20 Gb/s bandwidth. The communication software is based on a mpi implementation mvpich-0.9.9 from OSU coupled with OFED 1.2.5 from mellanox.
280 Node Intel InfiniBand Cluster This cluster prototype deployed in 2006 consists of 280 nodes of Intel Pentium D (dual-core) CPUs connected via InifiniBand switched networks. Each node has a single processor running at 3.0GHz with 800 MHz front side bus, 1 GB memory and 80 GB SATA disk. In addition, each node has a PCI-EXPRESS (4x) slot for an InfiniBand HCA adapter that provides 10 Gb/s bandwidth. The communication software is based on a mpi implementation mvpich-0.9.7 from OSU coupled with IBG2 from mellanox.
384 Node Intel Cluster The previous cluster prototype consists of 384 nodes arranged as a 6x8x2^3 mesh (torus). These nodes are interconnected using 3 dual gigE cards plus one half of the dual gigE NIC on the motherboard (the other half is used for file services. This 5D wiring can be configured as various configurations of a 3D torus (4D and 5D running is possible in principle, but is less efficient so is not used). Nodes are single processor 2.8 GHz Xeon, 800 MHz front side bus, 512 MB memory, and 36 GB disk. With the improved Domain Wall algorithm (optimized assembly code), this cluster will achieve approximately 0.7 teraflops sustained.

Message passing on this novel architecture is done using an application optimized library, QMP, for which implementations will also target other custom LQCD machines. The lowest levels of the communications stack are implemented using a VIA driver, and for multiple link transfers VIA data rates approaching 500 MB/sec/node have been achieved.



Decommissioned Intel Clusters

256 Node Intel Cluster The 2003 deployed cluster of 256 nodes is configured as either a 2x2x4x8. Nodes are interconnected using 3 dual gigE cards (one per dimension, view of wiring) and one on-board link, an approach which delivers high aggregate bandwidth while avoiding the expense of a high performance switch. Nodes are single processor 2.67 GHz Pentium 4 Xeon with 256 MByte of memory. The cluster achieves 0.4 TeraFlops on LQCD applications, All nodes run RedHat 9 with kernel version of 2.4-27 with the improved NFS support.

Additional information can be found in an Intel case study (pdf file).

128 Node Intel Cluster The oldest cluster, now decommissioned, was installed in 2002. The cluster contained 128 nodes of single processor 2.0 GHz Xeons connected by a myrinet network. It achieved 1/8 TeraFlops of performance (Linpack), and contained 65 GBytes of total physical memory, and delivered 270MB/s simultaneous node-to-node aggregate network bandwidth. All nodes ran RedHat Linux Release 7.3 with kernel version of 2.4-18.

Additional configuration information.