SZIN Macro Package

Date January 2, 2002

Introduction

The SZIN software system (SZIN is color in Hungarian) supports data-parallel programming constructs for lattice field theory and in particular lattice QCD. It is a object oriented macro-based C system that presents a single high level code image to the user, but can generate highly optimized code for many architectural systems including single node workstations, multi-process or multi-threaded SMP workstations, clusters of workstations via MPI or GM (myrinet), the (now defunct) CM-2, vectorizable Cray, and the QCDSP.

This manual can be found at http://www.jlab.org/~edwards/szin

How to build

To install first do a "configure". Look at the top of the configure file describing how MACHINE_TYPE's are used and how FERMION_TYPE's are used. An initialization for an intel box running linux, and a desired code that is a "scalar" only (only one box with one processor), you would run

perl configure MACHINE_TYPE=linux

Then, go into the mainprogs/main or mainprogs/tests directory . To build for wilson fermions, do

gmake MACHINE_TYPE=linux FERMION_TYPE=w linux_t_mesplq_w

which is the simple test program of plaquette measurements. The general syntax for the build of a main program is as follows:

gmake MACHINE_TYPE=<type> FERMION_TYPE=<fermtype> <type>_<main prog name>_<fermtype>

where the <main program name> is one of the files in the mainprogs/tests or mainprogs/main directory.

Machine architecture

The MACHINE_TYPE is a string describing something about the compiler and system architecture. For example, <type>=linux is a single box, single processor code for an (nominally) x86 class processor running linux. More about the known MACHINE_TYPE's is below.

Currently <type> can be one of the following. Note, case is significant.

dspsun SUNOS parallel scalar using a QCDSP machine
bsun Sun scalar
vsun Sun vector
ibm RIOS-2 scalar
cibm RIOS-2 scalar (single precision only)
qibm RIOS-2 scalar (double precision only)
eibm RIOS-2 vector
fibm RIOS-2 parallel scalar. Uses sockets for inter process comm.
Sibmsmp RIOS(-3) multiprocess smp
Ibmsmp RIOS(-3) multiprocess smp and MPI like communication among boxes
osf AXP/OSF scalar
posf AXP/OSF vector
rofsmp AXP/OSF multiprocess smp
jlabsmp AXP/Linux multiprocess smp with MPI communication among boxes
axtsmp AXP threaded smp
kxtsmp AXP threaded smp with MPI communication among boxes
hp HPUX scalar
ultra SUN Solaris 2.x with ultra sparc compiler
mitsmp SUN Solaris 2.x multiprocess smp using ultra sparc (single box)
gmitsmp SUN Solaris 2.x multiprocess smp using ultra sparc (single box) (double precision)
Jlabsmp SUN Solaris 2.x multiprocess smp using ultra sparc and MPI communication among boxes
linux Intel PC running Linux
znxsmp Intel PC multiprocess smp running Linux
ynxsmp Intel PC multiprocess smp running Linux (double precision)
inxtsmp Intel PC threaded smp running Linux
qnxsmp Intel PC multiprocess smp running Linux and MPI like communication among boxes
wnxtsmp Intel PC threaded smp running Linux and MPI like communication among boxes

The allowed fermion type <fermtype> are

w Wilson
s Staggered
g Pure gauge

Supported architecture types are

scalar Outer grid loop/no inner grid. Basically a workstation.
vector No outer grid loop/inner grid loop. Vector machine
mixed Both outer and inner grid loop. Example is CM-5 or possibly a CRAY
cm No outer grid. Inner grid is via PARIS calls for CM-2.
dsp Scalar structure with outer grid divided among other processors (should be called parscalar).
smp Scalar structure using multi processes for smp.
tsmp Scalar structure using threads for smp.
parsmp Scalar structure using multi processes for smp and outer grid also divided among other processors.
partsmp Scalar structure using threads for smp and outer grid also divided among other processors.

Fermion types

The FERMION_TYPE is one of "w" (Wilson like fermions, namely 4 spin components), "s" (Staggered like fermions - 1 spin component), and "g" (pure gauge). The latter is obviously not a fermion, but used when linking against pure-gauge only main programs and measurements.

Programs

Description

Generically the "szin/mainprogs/main" files are drivers around some work routine. In the case of the spectro programs, there is a spectroN_w.m and a corresponding hadronN_w.m

Spectro

The first simple spectroscopy version using only wall & pt. sources.

Spectro2

NOTE: either spectro2 & spectro4 are the preferred S-wave spectroscopy programs

S-wave meson, meson currents, and baryon spectroscopy.

Souped up spectro.m. Added all the FFT based stuff for shell sources via smearing (in momentum space via FFT), boundstate sources, heavy-light, same valence but mixed smearing at source and sink so as to use for smearing matrix constructions. All S-wave stuff. Urs put in a lot of work into it, and we used it as a major workhorse for all our spectrum work. The real work routine is hadron2_w.m It is rather big and in my opinion unwieldy.

Spectro4

NOTE: either spectro2 & spectro4 are the preferred S-wave spectroscopy programs

The FFT stuff is not very portable. The latest and preferred (?) S-Wave spectroscopy program is spectro4. It is an updated and stripped down I nuked bound-state sources

and sinks. That turned out to be an un-optimal approach

Spectro3

P-wave meson sources & sinks. So, can do the a_1, etc.

The measurement routines are in

szin/lib/meas/hadron

Also, there are "stripper" scripts I have I'm putting into szin/scripts (perl scripts) that strip out the data and dump into some directory. The names of the stripper match the mainprog. They all take in stdin, so I cat or zcat in the whole pile of namelist files and it runs through them pulling out the zillions of channels in each config and recombines them into unique files.

Parameters

Globals

Most of the important parameters are global variables. They can be found in szin/macros/common_declarations.h . Here is a description of some of the variables.

Fundamental

Nd number of dimensions

Nc number of color components

Ns number of spin components

FermType global var describing type of fermions, namely wilson like, staggered like or pure gauge. Wilson like means 4 spin components, so Wilson, Clover, Overlap, etc. Staggered like are 1 spin components like Naik, and pure gauge is something like Ns=0.

Fermions

FermAct global var holding the fermion action. The macros/primitives holds the relevant values. "OVERLAP_POLE" is what I use for overlap (the Neuberger construction), but instead use our Remez approx. to epsilon. "TRUNC_OVERLAP" is Neuberger's version.

OverAuxAct the auxilliary action used in the overlap operator. By default it is the hermitian (unpreconditioned) Wilson operator, gamma_5*(1-kappa*D_wilson) . Note the kappa normalization.

NwilsVec number of vectors for projection. Affects various overlap versions.

RatPolyDeg the name and variable space was taken from Stefan/Ivan/Tony's Rational Poly. Approx stuff. Here, it is the order of the poly in x*P(x^2)/Q(x^2), so P(y) ~ const*y^{RatPolyDeg} + .... For OVERLAP_POLE I set it to 1. For TRUNC_OVERLAP and dwf type stuff it functions as L_s .

PolyArgResc For TRUNC_OVERLAP and dwf it acts like a_5, the scale factor for inside the epsilon or the 5-d lattice spacing.

PolyRangUp For OVERLAP_POLE. The maximum eigenvalue of OverAuxAct . The largest of the smallest eigenv. should be within suitable range of the 14th order Remez approx. used.

PolyRangLow For OVERLAP_POLE. The maximum eigenvalue of OverAuxAct .

A caveat about OVERLAP_POLE. In typical use projection is always used.

NWilsVec can be -1 to compute one low lying (and the max) eigenvalue, but don't use the low ones in projection. The main goal is to establish rescaling.

The largest eigenvalue of OverAuxAct is rescaled to 4 (this is all in conslinop_w.m). This is a little aggressive, but is considered the max eigenv. acceptable in our 14th order Remez approx.

Spectroscopy

The kinds of smearing around are (spectro2 and spectro4) classed like:

Z3_src Z(3) style sources (a crystal). This can replicate across many sites the same source, but with a random Z(3) phase. Works well in really big volumes

Pt_src/snk point source/sink

Wl_src/snk wall source/sink

Sl_src/snk shell source/sink

Bd_src/snk boundstate source/sink. A boundstate is a meson or baryon with quarks that are pt-src and the other quark a shell source

Within Shell source constructions (including boundstate), there are different wavefunctions constructions that are used. I often thought of having a point wave function and wall wavefunction, but the latter doesn't work mathematically (nicely, at least).

Wvf_kind wave function kind:

(1 = GAUSSIAN_WVF) gauge-variant Gaussian wvf.

(2 = EXPONENTIAL_WVF) gauge-variant exponential wvf

(3 = GAUGE_INV_EXPONENTIAL_WVF) gauge-invariant gaussian wvf

(4 = WUPPERTAL_WVF) wuppertal

There are also the parameters for the shell-like smearing. In the gauge-variant gaussian exp., the smearing function is exp(-p^2/2*width). [I need to double check the 2]. The gauge-covariant version exp. by (1 - laplac(U)*width/N)^N .

For gauge-variant and non-invariant, one reads

wvf_param array of the smearing widths. The array has elements matching each kappa in the list

WvfIntPar array of the number of smearing iterations (used for covariant smearing)

The smearing width has the same meaning in both invariant and variant cases.

Spin conventions

The spin conventions are contained in the macro file spinor.mh . The default spin conventions used are the same as the MILC code and also the Columbia Physics System. The basis is a chiral one. The gamma matrices are below.

Here is the basis. Note, they are labeled gamma_{1,2,3,4} for x,y,z,t in that order. You could also think of x as the 0^th directorion, so they are from 0 to 3, inclusive.

# gamma( 1) 0 0 0 i # ( 0001 ) --> 1

# 0 0 i 0

# 0 -i 0 0

# -i 0 0 0

# gamma( 2) 0 0 0 -1 # ( 0010 ) --> 2

# 0 0 1 0

# 0 1 0 0

# -1 0 0 0

# gamma( 3) 0 0 i 0 # ( 0100 ) --> 4

# 0 0 0 -i

# -i 0 0 0

# 0 i 0 0

# gamma( 4) 0 0 1 0 # ( 1000 ) --> 8

# 0 0 0 1

# 1 0 0 0

# 0 1 0 0

People often ask about changing the spin basis. The basis is a chiral one and is changeable in the macros. It is not a trivial change such as flipping a parameter turns on a different basis (I'd like to switch to that though). However, specially written optimized codes for the dirac operator have the basis hardwired in - they are low level codes! Probably more of a pain are the baryons. The baryon wave functions and output is basis dependent. One can always do a similarity transformation (e.g., full propagators can always switch bases); however, all the baryon spin source and sink components need to be saved to do this rotation and they are not all currently saved.

The way we do mesons is enumerate the possible gamma matrix products

Gamma = gamma_1^n1 * gamma_2^n2 * gamma_3^n3 * gamma_4^n4

where n_i are single bit fields. Since gamma_1 comes first the bit for it must come first. So, then, for example,

gamma_5 is represented as 1111b = 15d .

gamma_1*gamma_2*gamma_4 is represented as 1011b = 11d (note the ordering).

This enumeration is gamma basis independent.

Output

Spectroscopy

First, mesprop and bar_prop are the simple 2 point same source and sink spin constructions (up to a possible gamma_3, assuming here gamma_3 is in the time direction). The currents are when you do non-local thingies like constructing a conserved vector current for wilson - use a gauge field to hop one site). Axial is similar but partially conserved.

The objects one wants for a conventional mass analysis are the mesprop and barprop objects. The "array" indices used for all these things have a fortran ordering - namely, 1st to last index is faster to slower vaying.

For mesons (can see this in szin/lib/meas/hadron/wrimes_w.m)

mesprop(time_extent, Ns*Ns)

mesdisp(time_extent, num_momentum_chanels, Ns*Ns)

The way we do mesons is enumerate the possible gamma matrix products

Gamma = gamma_1^n1 * gamma_2^n2 * gamma_3^n3 * gamma_4^n4

where n_i are single bit fields. Since gamma_1 comes first the bit for it must come first. So, then, for example,

gamma_5 is represented as 1111b = 15d .

gamma_1*gamma_2*gamma_4 is represented as 1011b = 11d (note the ordering).

This enumeration is gamma basis independent.

Only 3 momenta are considered and they are enumerated in a packed order to get it as an array index. Currently, all relevant permutations of 3 momenta are averaged over. E.g., (0,0,0) is index 0, and (1,0,0) is index 1, (1,1,0) is index 2, and on up in |p|^2. The jump is at 9 (the first degenerate p^2).

Directories

File name conventions

Main programs

All main programs link against the <type>_szin.a, <type>_szin_[wsg].a and <type>_szin_x.a . These libraries are kept in the "lib" directory. The first library, the <type>_szin.a contains fermion independent codes - namely ones that might use fermions (like hybrid.m - the HMC code), but don't need to know exactly which fermion. Heat-bath codes are in here. The <type>_szin_[wsg].a is fermion dependent, like hadron measurements where operators are specific to the fermion type. The <type>_szin_x.a is the "external" library. The way SZIN works is each M4 macro call on a lattice object produces a "C" or possibly assembly file which will live in lib/ext/ext_<type> . These files have hashcoded names to reduce them in length to something linkers are happy with.

Main

The directory mainprogs/main contain main physics codes.

Tests

The directory mainprogs/tests contain various test programs used for debugging and code verification

Tests

The tests directory contain sample namelist input and namelist output files for various programs. More to come. The names of the subdirectories of tests correspond to program names in the mainprogs/main and mainprogs/tests directories. There is a “*.ini” or “*.ini0” file. This is a namelist input file that is read in. Look at the file for instructions and sample input. The “.ini” file is copied onto the DATA file. This file is read by the programs, and has inside a possible config file, and they write a NMLDAT, RESULT, and possibly new DATA and CFGOUT files.

Macros

The SZIN macros are kept in macros

What defines SZIN are the macros that are used for preprocessing. A casual user doesn't need to dive into these things. However, there are useful macro constants defined in macros/primitives.mh that are used by many programs. These constants, like a number describing using the CG inverter are in this file. The "makestuff" contains stuff used for make (not surprising). The "root" directory contains the true main programs. These then in turn call the "mainprogs". These true main programs are used to set up environments, etc. They have names like stand_main.c, smp_main.c, tsmp_main.c respectively set up the environments for generic single processor code (stand_main.c), multi-process smp code (with possibly multi-node as well) (smp_main.c), and threaded smp code (with possibly multi-node as well) (tsmp_main.c).

Lib

Actions

Construct

Linear operators

The directory containing the known linear operators is

szin/actions/linop

and

szin/actions/construct

is the location of the constructors/destructors. These constructors are just convenient packaging of the above code for a linear operator. Otherwise, there is nothing fundamental about them. They are, however, called by the Qprop routines.

The program conslinop_w.m has all the currently used linear operators. There maybe some more in the szin/actions/linop directory than are used in conslinop_w.m (there are also staggered ones). By convention, all the routines begin with "l".

ldpsi_w.m - The unpreconditioned Wilson-Dirac operator

lovddag_w.m - The overlap-Dirac (pole approx) of D^dag*D. This uses

an auxilliary operator (inside the epsilon()) that is

often the unprec. Wilson-Dirac operator.

lovlapms_w.m - The overlap "D".

ldwf_w.m - Unprec. dwf operator (5D).

For the precond. DWF (5D) Shamir operator, lots of routines come in. The operator itself is leompsi_w.m . It also uses ldwfproj_w.m, ldwfdiag_w.m, ldwfainv_w.m, and dwfahat_w.m . Without projection, many of these routines are trivial.

Dirac

Dslash

Gauge

Invert

Linop

Qprop

Quark propagator routines. Driver is Qprop. Calls appropriate quark prop routine by the value of FermAct

Meas

Eig

Gfix

Glue

Hadron

Pbp

Schrfun

Smear

Geom

Routines that setup the geometry and boundary conditions for the code. The geometry (defined in routines setgeom.* and setgeom32.*) refers to how the Nd dimension grid is layed out in memory and on multi-nodes.

The boundary conditions (routines setbc.* and setbc32.*) set things like offsets used for communications within memory and between nodes.

The code is strictly a checkerboarded code and the state is contained in the variable Nsubl. Namely, there can be 2 checkerboards (even/odd) and in Nd dimensions there can be 2

Scripts

Namelist_diff - a slick namelist file differ

Build_vpath - constructs the vpath used in lib/Makefile

The rest of the scripts strip namelist data and write it into files in a specified directory. Generically, the file names match the program they are designed to strip and possibly output for a specific routine

All the strippers take in stdin, so I cat or zcat in the whole pile of namelist files and it runs through them pulling out the zillions of channels in each config and recombines them into unique files.

This really should all go away and have the data output in XML or the like. Still, you need to know where to put the stuff and what to name it if it is on disk. More to be done.

Program constructs

Input/Output

There are three classes of input/output called NAMELIST, BINARY and TERMINAL. The relevant operations are below. Note, the file handles (buffers) are explicit on the open/create/close and not explicit on the write/read instructions.

CREATE_NAMELIST(nml_out, “A_namelist_file”);

READ_NAMELIST(<group>, <var1>,…,<var n>);

CLOSE_NAMELIST(nml_out);

OPEN_NAMELIST(nml_in, “A_namelist_file”);

READ_NAMELIST(<group>, <var1>,…,<var n>);

CLOSE_NAMELIST(nml_in);

CREATE_BINARY(cfg_out, “A_binary_file”);

READ_BINARY(<var1>,…,<var n>);

CLOSE_BINARY(cfg_out);

OPEN_BINARY(cfg_in, “A_binary_file”);

READ_BINARY(<var1>,…,<var n>);

CLOSE_BINARY(cfg_in);

CREATE_TERMINAL(trm_out, “A_terminal_file”);

READ_TERMINAL(<var1>,…,<var n>);

CLOSE_TERMINAL(trm_out);

OPEN_TERMINAL(trm_in, “A_terminal_file”);

READ_TERMINAL(<var1>,…,<var n>);

CLOSE_TERMINAL(trm_in);

The way data is read in (namely parameters) and data is written out is using “NAMELIST”. This format is taken over from Fortran and has group structure like:

&param

var = 3.7,

This_is_a_list = (1.0, 2, 3),

This_is_complex = (1, 2),

A_string = “this is a string”,

This_is_another_list(0) = 1.0,

This_is_another_list(1) = 2,

This_is_another_list(2) = 3,

&END

The names are representative of the types. The data is read using code like

REAL(var);

REAL(This_is_a_list, 3);

REAL(This_is_another_list, 3);

COMPLEX(This_is_complex);

STRING(A_string);

OPEN_NAMELIST(nml_in, “some_file”);

READ_NAMELIST(param, A_string,var,This_is_a_list,This_is_another_list);

READ_NAMELIST(param,This_is_complex);

CLOSE_NAMELIST(nml_in);

Note: the parser does not assign types to the expressions parsed. When the READ_NAMELIST is called on the individual variable then the expected type is known and the data is converted to that type. By default, all variables are `0’, so if there is an error on reading (like the variable is not in the namelist file or an integer type is read but the data was a string), then the default is used. Also, order of the parameter entries and even order of the namelist groups is not important. Currently, the entire namelist file is parsed then the READ_NAMELIST looks for a specific group and a specific variable. It is an error to have the same group twice or the same variable name entered twice within the same group.

Writing data is similar, but now a single use of WRITE_NAMELIST(<group>, <vars>,…) writes the whole group. Writing 2 instances of WRITE_NAMELIST with the same group name makes two instances of that group. The output file is buffered in the usual file I/O buffering way. To open a file and to write something using the above declarations would look like

CREATE_NAMELIST(nml_out, “another_file”);

WRITE_NAMELIST(stuff, A_string, This_is_complex);

CLOSE_NAMELIST(nml_out);

Reading and writing binary data is what you expect. The following

CREATE_BINARY(cfg_out, “a_binary_file”);

WRITE_BINARY(This_is_complex,This_is_another_list);

CLOSE_BINARY(cfg_out);

OPEN_BINARY(cfg_in, “a_binary_file”);

READ_BINARY(This_is_complex,This_is_another_list);

CLOSE_BINARY(cfg_out);

writes and reads in the variables “This_is_complex” and “This_is_another_list”. Since the underlying code is “C”, one should not expect record marks or other antique such things in the binary file. On all platforms, the binary format is always “little-endian” – namely the byte-ordering of SUN’s, HP, IBM Rios and opposite that of Alpha and Intel.

Linear Operator

In the language of Lisp, the LINEAR_OPERATOR acts as a "funarg" constructor. The idea here is that all the linear operators, namely things like applying the preconditioned Wilson-Dirac operator on a vector (a Lattice_Fermion) all look generically the same. Namely, they take an input vector, some structure holding info like gauge fields, the number of checkerboards it acts on, the "sign" of the operator (whether it is D or D^dagger), and then the output. The LINEAR_OPERATOR holds all the additional arguments to pass in including the address of the function. The syntax of the linear operator constructor is

CONSTRUCT_LINEAR_OPERATOR(A, <function name>, <arg 1>, <arg 2>, ..., <arg n>);

Here is a skeleton outline of syntax mixed wih code usage:

LINEAR_OPERATOR(A); /* this is the declaration */

ALLOCATE(A); /* reserves memory for the structure itself */

/* Possibly allocate space for some of the args below */

ALLOCATE(arg1); /* Note, this may not be used/needed/wanted */

/* This constructs the operator (or rather assigns parts of A to the args */

CONSTRUCT_LINEAR_OPERATOR(A, <function name>, <arg 1>, <arg 2>, ..., <arg n>);

/* The syntax for calling A */

CALL(A, <some linear operator holding info, usually A>, <args for A>);

/* In the code, the convention is always */

CALL(A, A, <input fermion>, <output fermion>, <(int)# checkerboards>,<(int)sign>);

/* Inside, the code for the linear operator, you'll need the args */

/* Get the args by unpacking */

/* Note, the args correspond in order and type (but not necessarily name) */

/* to the args in the construct_linear_operator */

UNPACK_LINEAR_OPERATOR(A, <arg 1>, <arg 2>, ..., <arg n>);

/* Then use code as expected */

/* Possibly free space for some of the args above */

/* Free the space reserved by the linear operator */

/* Note, since the memory allocator is stack based, one must undo */

/* the memory in the correct (opposite) order */

FREE(arg1);

/* Free the space reserved by the linear operator */

FREE_LINEAR_OPERATOR(A);

/* Free the linear operator */

FREE(A);

Checkout szin/lib/actions/construct/conslinop_w.m and look for the CLOVER case.

The implementation of LINEAR_OPERATOR can be found in szin/macros/linop.mh . It is a structure with an element for the address of the function, and a void ** pointer for places to hold the memory. The ALLOCATE(A) allocates this structure then the CONSTRUCT_LINEAR_OPERATOR allocates the internal arrays and assigns values (the values from the arg pointers) to the elements. In the case of a scalar variable passed in, memory is allocated and the value of the scalar variable is copied to this new location. The address of the location is used like other pointers.

Macro operations

The currently defined macro operations are

ALLOCATE:

Memory stack allocate

ALLOCATE(<arg>)

where <arg> has been defined.

FREE:

Memory stack free

FREE(<arg>)

where <arg> has been defined.

BINARYOP:

Add or subtract two fields and write onto a target

BINARYOP ( <source1>, <source2>, <target>, <option> )

Where <option> :== ADD | SUBTRACT

BINOPS:

<function> ( <source1>, <source2>, <target> )

where <function> := MOD, MIN, MAX, DIVIDE, AND, OR, XOR

CAST:

Makes explicit the indices of a primitive type or as an overloading hides the indices.

This is really an ugly macro and is sort of a last resort for changing types. It is not meant to

be really fast.

CAST ( <source>, <target> )

Ex. Matrix(Nc,Nc) -> Array(Nc,Nc)

or Array(Nc,Nc) -> Matrix(Nc,Nc)

# CMPLX operations:

# CMPLX ( <source1>, <source2>, <target>, <option> )

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

# COERCE operations:

# <function> ( <source>, <target> )

# where <function> := FLOAT, INT, BOOL, FLOOR,

# CEILING, TRUNCATE, ROUND

# COMPARE operations:

# <function> ( <source1>, <source2>, <target> )

# where <function> := LT, GT, LE, GE, EQ, NE

# CONJG operations:

# CONJG ( <source>, <target>, <option> )

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

CONTRACT:

Contract 2 propagator fields onto a target field.

CONTRACT ( <source1>, <source2>, <target>, <option> )

where <option> :== ADD | SUBTRACT | NEGATE | REPLACE

The sources and targets must all be propagators but not necessarily of the same lattice type. Effectively, one can use this to construct an anti-quark from a di-quark contraction. In explicit index form, the operation does

target^{k' k}_{\alpha\beta} <option>=

\epsilon^{i j k}\epsilon^{i' j' k'} source1^{i i'}_{\rho \alpha} source2^{j j'}_{\rho \beta}

and is only appropriate for Nc=3 (or SU(3)).

Only

# COORD operations: COORD ( <source>, <target> )

# where <source> := scalar integer indicating

# direction (between 0 and Nd-1)

# and <target> := lattice integer filled with

# coord. along <source> dir.

# COPY operations:

# COPY ( <source>, <target>, <option> )

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

# COPYMASK operations:

# COPYMASK ( <source>, <mask>, <target>, <option> )

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

# FILL operations: FILL ( <target>, <value> )

# Ex. <value> can be a lattice scalar of the same

# primitive type as <target> (a lattice type).

# Or <value> can be a lattice scalar of a lower

# primitive type of <target> (a lattice type),

# so fill a matrix with a diagonal of <value>

# Or <value> can be a special symbol like

# ZERO,ONE,RANDOM,GAUSSIAN. If <target> is a

# Matrix, then filled with a diagonal matrix.

# If <target> has Matrix primitive, then

# RANDOM,GAUSSIAN fill the entire matrix

# INDEXING operations:

# INDEXING ( <source1>, <coordinate>, <target> )

# IO operations: these come in three varieties,

# binary (machine-dependent configuration files),

# NAMELIST (portable machine-readable data files), and

# terminal (human but not machine readable).

# CREATE_BINARY(unit_num, "filename"),

# OPEN_BINARY(integer, "filename"),

# WRITE_BINARY, READ_BINARY, CLOSE_BINARY(unit_num)

# WRITE_NAMELIST

# READ_TERMINAL, WRITE_TERMINAL

# MULTIPLY

# operations: MULTIPLY ( <source1>, <source2>, <target>, <option> )

# The allowed forms of source1 and source2 are

# source

# CONJUGATE(source)

# COMMUNICATE(source,<cb>,<sgn>,<dir>)

# CONJUGATE(COMMUNICATE(source,<cb>,<sgn>,<dir>))

# COMMUNICATE(CONJUGATE(source),<cb>,<sgn>,<dir>)

# If a COMMUNICATE is used, then <cb> must be specified with the corresponding

# sign and direction

# If both sources use a COMMUNICATE, then the CB must be the same

# between them. The sgn and dir are allowed to be different.

# CONJUGATE or COMMUNICATE is not allowed on the target

# The option must always be specified and is

# <option> :== REPLACE | NEGATE | ADD | SUBTRACT

# NEIGHBOUR_ALL ( <source>, <target>,

# <checker board>, <duplex> )

# If <duplex> = FORWARD or BACKWARD this is logically

# equivalent to

# FOR <direction> FROM 0 THROUGH Nd-1 DO

# { NEIGHBOUR ( <source>(<direction>),

# <target>(<direction>),

# <checker board>, <duplex>, <direction> );

# whereas if <duplex> = BOTH it is logically equivalent to

# FOR <direction> FROM 0 THROUGH Nd-1 DO

# { NEIGHBOUR ( <source>(<direction>,F_SWAP),

# <target>(<direction>,F_SWAP),

# <checker board>, FORWARD, <direction> );

# NEIGHBOUR ( <source>(<direction>,B_SWAP),

# <target>(<direction>,B_SWAP),

# <checker board>, BACKWARD, <direction> ); }

# This allows optimized communication by getting the neighbours

# in all directions at the same time (e.g., by using "Stencils"

# on the CM-2). The <checker board> argument is as for

# NEIGHBOUR, but the <source> and <target> must now be

# declared to be of the same PRIMITIVE type as <source>

# but with an ARRAY index of size Nd. For <duplex> = BOTH

# they must also have an array index of size 2.

# NEIGHBOUR ( <source>, <target>, <checker board>,

# <duplex>, <direction>[,<log2 distance>] )

# This carries out the operation

# FORALL x DO

# <target>[x, <checker board>] :=

# <source>[x' <duplex> <direction>,

# 1 - <checker board>]

# where <direction> stands for a unit vector or index as

# appropriate (the usual abuse of notation), and

# (anti)periodic boundary conditions are implied. <source>

# and <target> must have trivial ARRAY types (i.e., either

# they have no ARRAY dimensions at all, or they are completely

# indexed). Here, x' is the source coordinate after shifting

# by 2^distance given in log2_distance. This means that if

# log2_distance = 0, then only nearest neighbour communication

# is used and for log2_distance > 0, neighbours from a power

# of 2 are used. Note, in this case, then the source and

# target are on the same checkerboard. Arguments <duplex>,

# <checker board>, <direction> and <log2 distance> are all

# simple FORTRAN INTEGER variables, with <checker board> = 0|1,

# <duplex> = FORWARD|BACKWARD, and 0 <= j < Nd.

# All WORD, REALITY and PRIMITIVE types are supported.

# Obviously, only LATTICE GRID type is supported

# PARTOF operations: PARTOF ( <source>, <target>, <function> )

# where <function> := REAL_PART_OF and IMAG_PART_OF

# SEED_TO_FLOAT operations:

# SEED_TO_FLOAT ( <source>, <target> )

# Converts seeds to floating point numbers

# SLICE_SUMSQ operations:

# SLICE_SUMSQ ( <source>, <orthog direction>, <target>, <option> )

# SLICE_SUM ( <source>, <orthog_direction>, <target>, <option> )

# where <option> ::= REPLACE | NEGATE | ADD | SUBTRACT

# NOTE: this macro could be generalized to a list or

# orthogonal directions (a plane, a cube, etc)

# SPECIAL operations:

# SPECIAL ( <source1>, <source2>, ..., <sourceN> )

# Provides low level access to the primitive portion of an object.

# Should only be used when absolutely necessary to gain access to

# primitive indices. The macro removes the OUTER site level and

# starts a loop to the closing SPECIAL_END (if there is an outer

# index). A DEFINE_TYPE object is provided that has the primitive indices

# exposed.

# SPECIAL(source)

# produces

# primitive new object info

# SCALAR -> _SP_source : primitive SCALAR

# VECTOR(N) -> _SP_source(N) : primitive SCALAR with array size N

# MATRIX(N) -> _SP_source(N,M) : primitive SCALAR with array size N,M

# TRIANG(N) -> _DIAG_SP_source(N,2) : primitive SCALAR with array N

# _OFFD_SP_source(N*(N-1)/2,2) : primitive SCALAR with array N*(N-2/2)

# Special purpose macros for manipulating spinors

# SPIN_PROJECT ( <source>, <direction>, <PLUS/MINUS>, <target> )

# This carries out the operation

# <target> := (1 <+/-> gamma[<direction>]) * <source>

# where <source> is of PRIMITIVE type MATRIX[<colour>,Ns]

# and <target> is of PRIMITIVE type MATRIX[<colour>,Ns/2]

# (i.e, COLOUR_SPIN and COLOUR_HALFSPIN respectively).

# <source> and <target> must have REALITY type COMPLEX_64,

# and conformant GRID and WORD types.

# The projection operator (1 <+/-> gamma[<direction>])

# is specified by the arguments <+/-> and <direction>

# both of which are simple FORTRAN INTEGER variables,

# with <+/-> = +/-1 and 0 <= j < Nd.

# SPIN_PRODUCT ( <source>, <Dirac algebra element>, <target> )

# This carries out the operation

# <target> := Gamma[<Dirac algebra element>] * <source>

# where <source> and <target> are of PRIMITIVE type

# MATRIX[<colour>,Ns] and (i.e, COLOUR_SPIN). <source>

# and <target> must have REALITY type COMPLEX_64, and

# conformant GRID and WORD types.

# The argument <Dirac algebra element> is a simple

# FORTRAN INTEGER variable, lying between 0 and 15 inclusive.

# TRICK ( <source>, <direction>, <PLUS/MINUS>,

# <target>, <option> )

# <source> is of PRIMITIVE type MATRIX[<colour>,Ns/2]

# and <target> is of PRIMITIVE type MATRIX[<colour>,Ns]

# (i.e, COLOUR_HALFSPIN and COLOUR_SPIN respectively).

# <source> and <target> must have REALITY type COMPLEX_64, and

# conformant GRID and WORD types. Arguments <+/-> and

# <direction> are both simple FORTRAN INTEGER variables,

# with <+/-> = +/-1 and 0 <= j < Nd. The last argument

# specifies how <target> is to be modified; the choices are

# <option> :== REPLACE | NEGATE | ADD | SUBTRACT

# The TRICK operation makes use of the fact that

# (1 <+/-> gamma[<direction>]) is a projection operator,

# and thus in a suitable spinor basis can be implemented

# using only the top two components of the spinors

# (SPIN_PROJECT) with the bottom two components being

# reconstructed by TRICK:

# <target>[1] = <source>[1]

# <target>[2] = <source>[2]

# <target>[3] = a * <source>[1] + b * <source>[2]

# <target>[4] = c * <source>[1] + d * <source>[4]

# with a and b being suitable constants from the set

# {+/-1,+/-i}.

# SUMSQ operations: SUMSQ ( <source>, <target>, <option> )

# SUM ( <source>, <target>, <option> )

# where <option> ::= REPLACE | NEGATE | ADD | SUBTRACT

# Special purpose macros for manipulating SU(N) matrices

# SU2_EXTRACT ( <source>, <su2 index>, <b0>, <b1>, <b2>, <b3> )

# Compute the b(k) of A_SU(2) = b0 + i sum_k bk sigma_k */

# SUN_FILL ( <b0>, <b1>, <b2>, <b3>, <su2 index>, <target> )

# Fill in B from B_SU(2) = b0 + i sum_k bk sigma_k

# TRACE operations: TRACE ( <source>, <target>, <option> )

# where <option> ::=

# REAL_PART | IMAGINARY_PART | COMPLEX_PART

# TRACE_MULTIPLY operations:

# TRACE_MULTIPLY ( <source1>, <source2>, <target>, <option> )

# where <option> ::= REAL_PART | IMAGINARY_PART | COMPLEX_PART

# UNARY operations: <function> ( <source>, <target> )

# where <function> := ABS, EXP, COS, SIN, LOG, ACOS,

# SQRT, ASIN, TAN, ATAN, NOT, SIGN

Datatype structure

The basic type declaration in SZIN is DEFINE_TYPE, which defines a hierarchical type. Its general form is

DEFINE_TYPE(<name>, <word type>, <inner grid>, <reality>, <primitive>, <outer grid>, <array index>, ...)

and a type declaration of the form

LATTICE_COMPLEX_COLOUR_COLOUR_CB_DIRECTION(`U')

just expands into

DEFINE_TYPE(`U',`REAL_32',`ILATTICE',`COMPLEX_64', `MATRIX(`Nc',`Nc')',`SCALAR',`2',`Nd')

for a vector machine and

DEFINE_TYPE(`U',`REAL_32',`SCALAR',`COMPLEX_64', `MATRIX(`Nc',`Nc')',`OLATTICE',`2',`Nd')

for the Columbia Machine or the MIT SMP machine. The possible arguments for DEFINE_TYPE are

Word types:

UNSIGNED_32

INTEGER_32

REAL_32

REAL_64

BOOLEAN

Inner grid types:

SCALAR

ILATTICE

Reality types:

SCALAR

COMPLEX_64

Primitive types:

SCALAR

SEED

TRIANG(<dimension>)

VECTOR(<dimension>)

MATRIX(<dimension 1>, <dimension 2>)

PRODUCT(<TYPE 1>, <TYPE 2>)

Outer grid types:

SCALAR

OLATTICE

Array types: Up to 5 (This is easily extended) array indices are allowed, they are 0-based, and the arguments to DEFINE_TYPE are their dimensions.

The basic type structure is described by a fiber-bundle language, that is the bundle is the base manifold (the lattice) and at a site there is a fiber with some type. Mathematically, the primitive types we want at a site are either simple scalar values, or live within the product space of a vector space over color components with a vector space over spin components. This is encapsulated in the PRODUCT primitive type above where TYPE 1 and TYPE 2 are composed of the basic types of above the PRODUCT. So, we want generically

Gauge fields: Product(Matrix(Nc),Scalar)
Fermions: Product(Vector(Nc),Vector(Ns))

· Scalars: Scalar

· Gammas: Product(Scalar,Matrix(Ns))

· Propagators: Product(Matrix(Nc),Matrix(Ns))

where Nc indicates the dimension of the color vector space and Ns indicates the dimension of the spin vector space. For example, Scalars can multiply gauge fields or fermions. A gamma matrix can multiply a fermion, but not a gauge field. A gauge field can multiply a fermion, but a fermion cannot multiply another fermion unless there is a conjugation involved.

For example, a fermion can be laid out in memory as a Nc x Ns matrix, but mathematically a fermion is not equivalent to a matrix.

Example

/* @(#)polylp.m 11.2 (8/22/96) */

/* POLYLP -- calculate the global normalized sum of the Polyakov loop */

/* u -- gauge field (Read) */

/* poly_loop -- Polyakov loop average in direction mu (Write) */

/* mu -- direction of Polyakov loop (Read) */

include(types.mh)

SUBROUTINE(polylp, u, poly_loop, mu)

LATTICE_COMPLEX_COLOUR_COLOUR_CB_DIRECTION(u);

DCOMPLEX(poly_loop);

INTEGER(mu);

{ /* Local Variables */

include(COMMON_DECLARATIONS);

LATTICE_COMPLEX_COLOUR_COLOUR(u_dble);

LATTICE_COMPLEX_COLOUR_COLOUR(poly);

LATTICE_COMPLEX_COLOUR_COLOUR(tmp_1);

LATTICE_COMPLEX(poly_trace);

INTEGER(cb);

INTEGER(n);

DOUBLE(dummy);

START_CODE;

ALLOCATE(u_dble);

ALLOCATE(poly);

ALLOCATE(tmp_1);

ALLOCATE(poly_trace);

cb = 0; /* Start from even checkerboard */

/* Construct double links */

MULTIPLY(u(cb,mu), COMMUNICATE(u(1-cb,mu),cb,FORWARD,mu), u_dble, REPLACE);

/* Multiply together to get the Polyakov loop */

  COPY(u_dble, poly, REPLACE);

for(n = 2; n <= VALUE(nrow(mu))/2; ++n) /* runs over half the linear size */

{

NEIGHBOUR(poly, tmp_1, cb, FORWARD, mu, 1);

MULTIPLY(u_dble, tmp_1, poly, REPLACE);

}

/* Take the trace and sum up */

TRACE(poly, poly_trace, COMPLEX_PART);

SUM(poly_trace, poly_loop, REPLACE);

FREE(poly_trace);

FREE(tmp_1);

FREE(poly);

FREE(u_dble);

dummy = TO_DOUBLE(1) / TO_DOUBLE(Nc*vol_cb);

MULTIPLY(poly_loop, dummy, poly_loop, REPLACE);

END_CODE;

}

Getting the source via CVS

The source code is contained in a CVS repository at JLab. Those with an account at JLab can get the code with a CVSROOT variable as below

setenv CVSROOT :ext:jlabs1.jlab.org:/home/edwards/cvsroot

setenv CVS_RSH ssh

cvs checkout szin

The CVS home site with documentation is http://www.cvshome.org

there is a useful tutorial/on-line book at http://cvsbook.red-bean.com/cvsbook.html

The code is also available by anonymous “pserver” access. NOTE, I’m trying to get this to work. The source code is also available by a tarball on the web site.

Rather than having to wipe out whatever directory your directory to update to a new tarball, use CVS instead. This is called tracking a 3rd party source.

1) Go into your home directory (or wherever) and do a

% cvs init

this will create a ~/cvsroot and ~/cvsroot/CVSROOT . This

is the location of your repository.

2) Get the tar from the web site and untar it in say /tmp. Then

% cd /tmp/szin

% cvs import szin RGE szin13-6 # the szin is your module name

# RGE is the vendor name (something unique)

# szin13-6 is the release tag. Actually you

# make up all of these - nowhere are you

# getting these names from me

% rm -rf /tmp/szin

% cd <whatever work directory you want>

% setenv CVSROOT ~/cvsroot

% cvs checkout szin

% cd szin # this is now your working directory

3) Say some time later you want to update to a newer version,

not necessarily the next release. Get the tar file and untar it

in say /tmp

% cd /tmp/szin

% cvs import szin RGE szin13-9

% rm -rf /tmp/szin

% cd <workdir>/szin

% cvs update

% <use all your favorite cvs commands>

Note this will now track changes. If you edit and commit some changes, then if later I also insert changes, you will merge then when you do the next import.

Note, with sufficient permissions on the repository files (like group access), anybody else there can also update your cvs source files.

Also note, the down-side of this method is you don't have the real repository info - you can't see the logs, etc. of why somebody here changed something.

Running the program under MPI

For compilation with MPI, the flags in the Makefile.<machine_type> probably will need to be changed. At JLab, the MPI and LIBS flag are

MPI = /usr/local/mpich-gm

LIBS = -L$(MPI)/lib -L/usr/local/gm/binary/lib -L$(SMPDIR) -lmpich -lgm

-lQCDutils -lpthread -lots -lm

A sample configuration for MPI over Myrinet’s GM is

hpci3a 2

hpci2a 2

To run the program at JLab, do

/usr/local/mpich-gm/bin/mpirun.ch_gm [-np #] --gm-v --gm-f gm.conf ../../main/tests/kxtsmp_t_mesplq_g -n 1