SZIN Macro Package

Date January 2, 2002

Introduction

How to build

Machine architecture

Fermion types

Programs

Description

Parameters

Spin conventions

Output

Directories

Program constructs

Macro operations

Datatype structure

Example

Getting the source via CVS

Introduction

The SZIN software system (SZIN is color in Hungarian) supports data-parallel programming constructs for lattice field theory and in particular lattice QCD. It is a object oriented macro-based C system that presents a single high level code image to the user, but can generate highly optimized code for many architectural systems including single node workstations, multi-process or multi-threaded SMP workstations, clusters of workstations via MPI or GM (myrinet), the (now defunct) CM-2, vectorizable Cray, and the QCDSP.

 

This manual can be found at    http://www.jlab.org/~edwards/szin

 

How to build

To install first do a "configure". Look at the top of the configure file describing how MACHINE_TYPE's are used and how FERMION_TYPE's are used. An initialization for an intel box running linux, and a desired code that is a "scalar" only (only one box with one processor), you would run

 

perl configure MACHINE_TYPE=linux

 

Then, go into the  mainprogs/main or mainprogs/tests directory . To build for wilson fermions, do

 

gmake MACHINE_TYPE=linux FERMION_TYPE=w linux_t_mesplq_w

 

which is the simple test program of plaquette measurements. The general syntax for the build of a main program is as follows:

 

    gmake MACHINE_TYPE=<type> FERMION_TYPE=<fermtype>  <type>_<main prog name>_<fermtype>

 

where the <main program name> is one of the files in the  mainprogs/tests or mainprogs/main directory.

Machine architecture

The MACHINE_TYPE is a string describing something about the compiler and system architecture. For example, <type>=linux is a single box, single processor code for an (nominally) x86 class processor running linux. More about the known  MACHINE_TYPE's is below.

Currently <type> can be one of the following. Note, case is significant.

 

The allowed fermion type <fermtype> are

 

Supported architecture types are

 

Fermion types

The FERMION_TYPE is one of "w" (Wilson like fermions, namely 4 spin components), "s" (Staggered like fermions - 1 spin component), and "g" (pure gauge). The latter is obviously not a fermion, but used when linking against pure-gauge only main programs and measurements.

Programs

Description

Generically the "szin/mainprogs/main" files are drivers around some work routine. In the case of the spectro programs, there is a   spectroN_w.m  and a corresponding  hadronN_w.m

Spectro

The first simple spectroscopy version using only wall & pt. sources.

 

Spectro2

NOTE: either spectro2 & spectro4 are the preferred S-wave spectroscopy programs

 

S-wave meson, meson currents, and baryon spectroscopy.

Souped up spectro.m. Added all the FFT based stuff for shell sources via smearing (in momentum space via FFT), boundstate sources, heavy-light, same valence but mixed smearing at source and sink so as to use for smearing matrix constructions. All S-wave stuff. Urs put in a lot of work into it, and we used it as a major workhorse for all our spectrum work. The real work routine is hadron2_w.m It is rather big and in my opinion unwieldy.

Spectro4

NOTE: either spectro2 & spectro4 are the preferred S-wave spectroscopy programs

 

The FFT stuff is not very portable. The latest and preferred (?) S-Wave spectroscopy program is spectro4. It is an updated and stripped down I nuked bound-state sources

and sinks. That turned out to be an un-optimal approach

Spectro3

P-wave meson sources & sinks. So, can do the a_1, etc.

 

The measurement routines are in

szin/lib/meas/hadron

 

Also, there are "stripper" scripts I have I'm putting into szin/scripts (perl scripts) that strip out the data and dump into some directory. The names of the stripper match the mainprog. They all take in stdin, so I cat or zcat in the whole pile of namelist files and it runs through them pulling out the zillions of channels in each config and recombines them into unique files.

 

Parameters

Globals

Most of the important parameters are global variables. They can be found in  szin/macros/common_declarations.h  . Here is a description of some of the variables.

 

Fundamental

Nd                   number of dimensions

Nc                    number of color components

Ns                    number of spin components

FermType        global var describing type of fermions, namely wilson like, staggered   like or pure gauge. Wilson like means 4 spin components, so Wilson, Clover, Overlap, etc. Staggered like are 1 spin components like Naik, and pure gauge is something like Ns=0.

 

Fermions

FermAct           global var holding the fermion action. The macros/primitives holds the relevant values. "OVERLAP_POLE" is what I use for overlap (the Neuberger construction), but instead use our Remez approx. to epsilon. "TRUNC_OVERLAP" is Neuberger's version.

OverAuxAct     the auxilliary action used in the overlap operator. By default it is the hermitian (unpreconditioned) Wilson operator, gamma_5*(1-kappa*D_wilson)  . Note the kappa normalization.

NwilsVec            number of vectors for projection. Affects various overlap versions.

RatPolyDeg      the name and variable space was taken from Stefan/Ivan/Tony's Rational Poly. Approx stuff. Here, it is the order of the poly in  x*P(x^2)/Q(x^2), so P(y) ~ const*y^{RatPolyDeg} + .... For OVERLAP_POLE I set it to 1. For TRUNC_OVERLAP and dwf type stuff it functions as  L_s .

PolyArgResc    For TRUNC_OVERLAP and dwf it acts like  a_5, the scale factor for inside the epsilon or the 5-d lattice spacing.

PolyRangUp     For OVERLAP_POLE. The maximum eigenvalue of OverAuxAct . The largest of the smallest eigenv. should be within suitable range of the 14th order Remez approx. used.

PolyRangLow  For OVERLAP_POLE. The maximum eigenvalue of OverAuxAct .

 

A caveat about OVERLAP_POLE. In typical use projection is always used.

NWilsVec can be -1 to compute one low lying (and the max) eigenvalue, but  don't use the low ones in projection. The main goal is to establish rescaling.

The largest eigenvalue of OverAuxAct is rescaled to 4 (this is all in conslinop_w.m). This is a little aggressive, but is considered the max eigenv. acceptable in our 14th order Remez approx.  

 

Spectroscopy

The kinds of smearing around are (spectro2 and spectro4) classed like:

 

Z3_src             Z(3) style sources (a crystal). This can replicate across many sites the same source, but with a random  Z(3) phase. Works well in really big volumes

Pt_src/snk            point source/sink

Wl_src/snk            wall source/sink

Sl_src/snk            shell source/sink

Bd_src/snk       boundstate source/sink. A boundstate is a meson or baryon with quarks that are pt-src and the other quark a shell source

 

Within Shell source constructions (including boundstate), there are different wavefunctions constructions that are used. I often thought of having a point wave function and wall wavefunction, but the latter doesn't work mathematically (nicely, at least).

 

Wvf_kind            wave function kind:

    (1 = GAUSSIAN_WVF) gauge-variant Gaussian wvf.

    (2 = EXPONENTIAL_WVF) gauge-variant exponential wvf

    (3 = GAUGE_INV_EXPONENTIAL_WVF) gauge-invariant gaussian wvf

    (4 = WUPPERTAL_WVF) wuppertal

 

There are also the parameters for the shell-like smearing. In the gauge-variant gaussian exp., the smearing function is exp(-p^2/2*width). [I need to double check the 2]. The gauge-covariant version exp. by (1 - laplac(U)*width/N)^N .

 

For gauge-variant and non-invariant, one reads

 

wvf_param       array of the smearing widths. The array has elements matching each kappa in the list

WvfIntPar            array of the number of smearing iterations (used for covariant smearing)

 

The smearing width has the same meaning in both invariant and variant cases.

Spin conventions

The spin conventions are contained in the macro file   spinor.mh  . The default spin conventions used are the same as the MILC code and also the Columbia Physics System. The basis is a chiral one. The gamma matrices are below.

 

Here is the basis. Note, they are labeled  gamma_{1,2,3,4} for x,y,z,t  in that order. You could also think of x as the 0th directorion, so they are from 0 to 3, inclusive.

 

# gamma(   1)   0  0  0  i            # ( 0001 )  --> 1

#               0  0  i  0

#               0 -i  0  0

#              -i  0  0  0

 

# gamma(   2)   0  0  0 -1            # ( 0010 )  --> 2

#               0  0  1  0

#               0  1  0  0

#              -1  0  0  0

 

# gamma(   3)   0  0  i  0            # ( 0100 )  --> 4

#               0  0  0 -i

#              -i  0  0  0

#               0  i  0  0

 

# gamma(   4)   0  0  1  0            # ( 1000 )  --> 8

#               0  0  0  1

#               1  0  0  0

#               0  1  0  0

 

 

People often ask about changing the spin basis. The basis is a chiral one and is changeable in the macros. It is not a trivial change such as flipping a parameter turns on a different basis (I'd like to switch to that though). However, specially written optimized codes for the dirac operator have the basis hardwired in - they are low level codes! Probably more of a pain are the baryons. The baryon wave functions and output is basis dependent. One can always do a similarity transformation (e.g., full propagators can always switch bases); however, all the baryon spin source and sink components need to be saved to do this rotation and they are not all currently saved.

 

The way we do mesons is enumerate the possible gamma matrix products

Gamma = gamma_1^n1 * gamma_2^n2 * gamma_3^n3 * gamma_4^n4

 

where n_i are single bit fields. Since gamma_1 comes first the bit for it must come first. So, then, for example,

gamma_5 is represented as  1111b = 15d .

gamma_1*gamma_2*gamma_4 is represented as  1011b = 11d (note the ordering).

 

This enumeration is gamma basis independent.

 

Output

Spectroscopy

First, mesprop and bar_prop are the simple 2 point same source and sink spin constructions (up to a possible gamma_3, assuming here gamma_3 is in the time direction). The currents are when you do non-local thingies like constructing a conserved vector current for wilson - use a gauge field to hop one site). Axial is similar but partially conserved.

The objects one wants for a conventional mass analysis are the mesprop and barprop objects. The "array" indices used for all these things have a fortran ordering - namely, 1st to last index is faster to slower vaying.

For mesons (can see this in szin/lib/meas/hadron/wrimes_w.m)

 

mesprop(time_extent, Ns*Ns)

mesdisp(time_extent, num_momentum_chanels, Ns*Ns)

 

The way we do mesons is enumerate the possible gamma matrix products

Gamma = gamma_1^n1 * gamma_2^n2 * gamma_3^n3 * gamma_4^n4

 

where n_i are single bit fields. Since gamma_1 comes first the bit for it must come first. So, then, for example,

gamma_5 is represented as  1111b = 15d .

gamma_1*gamma_2*gamma_4 is represented as  1011b = 11d (note the ordering).

 

This enumeration is gamma basis independent.

 

Only 3 momenta are considered and they are enumerated in a packed order to get it as an array index. Currently, all relevant permutations of 3 momenta are averaged over. E.g., (0,0,0) is index 0, and (1,0,0) is index 1, (1,1,0) is index 2, and on up in |p|^2. The jump is at 9 (the first degenerate p^2).

Directories

File name conventions

 

Main programs

All main programs link against the   <type>_szin.a, <type>_szin_[wsg].a and <type>_szin_x.a . These libraries are kept in the "lib" directory. The first library, the <type>_szin.a contains fermion independent codes - namely ones that might use fermions (like hybrid.m - the HMC code), but don't need to know exactly which fermion. Heat-bath codes are in here. The <type>_szin_[wsg].a is fermion dependent, like hadron measurements where operators are specific to the fermion type. The <type>_szin_x.a is the "external" library. The way SZIN works is each M4 macro call on a lattice object produces a "C" or possibly assembly file which will live in   lib/ext/ext_<type>  . These files have hashcoded names to reduce them in length to something linkers are happy with.

Main

The directory mainprogs/main contain main physics codes.

Tests

The directory mainprogs/tests contain various test programs used for debugging and code verification

Tests

The tests directory contain sample namelist input and namelist output files for various programs. More to come. The names of the subdirectories of tests correspond to program names in the mainprogs/main and mainprogs/tests directories. There is a “*.ini” or “*.ini0” file. This is a namelist input file that is read in. Look at the file for instructions and sample input. The “.ini” file is copied onto the DATA file. This file is read by the programs, and has inside a possible config file, and they write a NMLDAT, RESULT, and possibly new DATA and CFGOUT files.

Macros

The SZIN macros are kept in macros

What defines SZIN are the macros that are used for preprocessing. A casual user doesn't need to dive into these things. However, there are useful macro constants defined in  macros/primitives.mh  that are used by many programs. These constants, like a number describing using the CG inverter are in this file. The "makestuff" contains stuff used for make (not surprising). The "root" directory contains the true main programs. These then in turn call the "mainprogs". These true main programs are used to set up environments, etc. They have names like   stand_main.c, smp_main.c, tsmp_main.c   respectively set up the environments for generic single processor code (stand_main.c), multi-process smp code (with possibly multi-node as well) (smp_main.c), and threaded smp code (with possibly multi-node as well) (tsmp_main.c).

Lib

Actions

Construct

Linear operators

 

The directory containing the known linear operators is

 

szin/actions/linop

 

and

 

szin/actions/construct

 

is the location of the constructors/destructors. These constructors are just convenient packaging of the above code for a linear operator. Otherwise, there is nothing fundamental about them. They are, however, called by the Qprop routines.

 

The program conslinop_w.m  has all the currently used linear operators. There maybe some more in the  szin/actions/linop directory than are  used in conslinop_w.m (there are also staggered ones). By convention, all the routines begin with "l".

 

ldpsi_w.m       - The unpreconditioned Wilson-Dirac operator

lovddag_w.m     - The overlap-Dirac (pole approx) of  D^dag*D. This uses

   an auxilliary operator (inside the epsilon()) that is

   often the unprec. Wilson-Dirac operator.

lovlapms_w.m    - The overlap  "D".

ldwf_w.m        - Unprec. dwf operator (5D).

 

For the precond. DWF (5D) Shamir operator, lots of routines come in. The operator itself is   leompsi_w.m  . It also uses ldwfproj_w.m, ldwfdiag_w.m, ldwfainv_w.m, and dwfahat_w.m . Without projection, many of these routines are trivial.

Dirac

Dslash

Gauge

Invert

Linop

Qprop

Quark propagator routines. Driver is Qprop. Calls appropriate quark prop routine by the value of FermAct

Meas

Eig

Gfix

Glue

Hadron

Pbp

Schrfun

Smear

Geom

Routines that setup the geometry and boundary conditions for the code. The geometry (defined in routines setgeom.* and setgeom32.*) refers to how the Nd dimension grid is layed out in memory and on multi-nodes.

The boundary conditions (routines setbc.* and setbc32.*) set things like offsets used for communications within memory and between nodes.

The code is strictly a checkerboarded code and the state is contained in the variable Nsubl. Namely, there can be 2 checkerboards (even/odd) and in Nd dimensions there can be 2

Scripts

Namelist_diff            - a slick namelist file differ

Build_vpath            - constructs the vpath used in lib/Makefile

 

The rest of the scripts strip namelist data and write it into files in a specified directory. Generically, the file names match the program they are designed to strip and possibly output for a specific routine

 

All the strippers take in stdin, so I cat or zcat in the whole pile of namelist files and it runs through them pulling out the zillions of channels in each config and recombines them into unique files.

 

This really should all go away and have the data output in XML or the like. Still, you need to know where to put the stuff and what to name it if it is on disk. More to be done.

Program constructs

Input/Output

There are three classes of input/output called  NAMELIST, BINARY and TERMINAL. The relevant operations are below. Note, the file handles (buffers) are explicit on the open/create/close and not explicit on the write/read instructions.

 

CREATE_NAMELIST(nml_out,  “A_namelist_file”);

READ_NAMELIST(<group>, <var1>,…,<var n>);

CLOSE_NAMELIST(nml_out);

 

OPEN_NAMELIST(nml_in,  “A_namelist_file”);

READ_NAMELIST(<group>, <var1>,…,<var n>);

CLOSE_NAMELIST(nml_in);

 

CREATE_BINARY(cfg_out,  “A_binary_file”);

READ_BINARY(<var1>,…,<var n>);

CLOSE_BINARY(cfg_out);

 

OPEN_BINARY(cfg_in,  “A_binary_file”);

READ_BINARY(<var1>,…,<var n>);

CLOSE_BINARY(cfg_in);

 

CREATE_TERMINAL(trm_out,  “A_terminal_file”);

READ_TERMINAL(<var1>,…,<var n>);

CLOSE_TERMINAL(trm_out);

 

OPEN_TERMINAL(trm_in,  “A_terminal_file”);

READ_TERMINAL(<var1>,…,<var n>);

CLOSE_TERMINAL(trm_in);

 

The way data is read in (namely parameters) and data is written out is using “NAMELIST”. This format is taken over from Fortran and has group structure like:

 

&param

  var = 3.7,

  This_is_a_list = (1.0, 2, 3),

  This_is_complex = (1, 2),

  A_string = “this is a string”,

  This_is_another_list(0) = 1.0,

  This_is_another_list(1) = 2,

  This_is_another_list(2) = 3,

&END

 

The names are representative of the types. The data is read using code like

 

REAL(var);

REAL(This_is_a_list, 3);

REAL(This_is_another_list, 3);

COMPLEX(This_is_complex);

STRING(A_string);

 

<do some allocates and other stuff>

 

OPEN_NAMELIST(nml_in, “some_file”);

READ_NAMELIST(param, A_string,var,This_is_a_list,This_is_another_list);

READ_NAMELIST(param,This_is_complex);

CLOSE_NAMELIST(nml_in);

 

Note: the parser does not assign types to the expressions parsed. When the READ_NAMELIST is called on the individual variable then the expected type is known and the data is converted to that type. By default, all variables are `0’, so if there is an error on reading (like the variable is not in the namelist file or an integer type is read but the data was a string), then the default is used. Also, order of the parameter entries and even order of the namelist groups is not important. Currently, the entire namelist file is parsed then the READ_NAMELIST looks for a specific group and a specific variable. It is an error to have the same group twice or the same variable name entered twice within the same group.

 

Writing data is similar, but now a single use of WRITE_NAMELIST(<group>, <vars>,…) writes the whole group. Writing 2 instances of WRITE_NAMELIST with the same group name makes two instances of that group. The output file is buffered in the usual file I/O buffering way. To open a file and to write something using the above declarations would look like

 

CREATE_NAMELIST(nml_out, “another_file”);

WRITE_NAMELIST(stuff, A_string, This_is_complex);

CLOSE_NAMELIST(nml_out);

 

Reading and writing binary data is what you expect. The following

 

CREATE_BINARY(cfg_out, “a_binary_file”);

WRITE_BINARY(This_is_complex,This_is_another_list);

CLOSE_BINARY(cfg_out);

 

OPEN_BINARY(cfg_in, “a_binary_file”);

READ_BINARY(This_is_complex,This_is_another_list);

CLOSE_BINARY(cfg_out);

 

writes and reads in the variables “This_is_complex” and “This_is_another_list”. Since the underlying code is “C”, one should not expect record marks or other antique such things in the binary file. On all platforms, the binary format is always “little-endian” – namely the byte-ordering of SUN’s, HP, IBM Rios and opposite that of Alpha and Intel.

 

 

Linear Operator

In the language of Lisp, the LINEAR_OPERATOR acts as a "funarg" constructor. The idea here is that all the linear operators, namely things like applying the preconditioned Wilson-Dirac operator on a vector (a Lattice_Fermion) all look generically the same. Namely, they take an input vector, some structure holding info like gauge fields, the number of checkerboards it acts on, the "sign" of the operator (whether it is D or D^dagger), and then the output. The LINEAR_OPERATOR holds all the additional arguments to pass in including the address of the function. The syntax of the linear operator constructor is

 

CONSTRUCT_LINEAR_OPERATOR(A, <function name>, <arg 1>, <arg 2>, ..., <arg n>);

 

Here is a skeleton outline of syntax mixed wih code usage:

 

 

LINEAR_OPERATOR(A);  /* this is the declaration */

ALLOCATE(A);         /* reserves memory for the structure itself */

 

/* Possibly allocate space for some of the args below */

ALLOCATE(arg1);  /* Note, this may not be used/needed/wanted */

 

/* This constructs the operator (or rather assigns parts of A to the args */

CONSTRUCT_LINEAR_OPERATOR(A, <function name>, <arg 1>, <arg 2>, ..., <arg n>);

 

/* The syntax for calling A */

CALL(A, <some linear operator holding info, usually A>, <args for A>);

 

/* In the code, the convention is always */

CALL(A, A, <input fermion>, <output fermion>, <(int)# checkerboards>,<(int)sign>);

 

/* Inside, the code for the linear operator, you'll need the args */

/* Get the args by unpacking */

/* Note, the args correspond in order and type (but not necessarily name) */

/* to the args in the construct_linear_operator */

UNPACK_LINEAR_OPERATOR(A, <arg 1>, <arg 2>, ..., <arg n>);

 

/* Then use code as expected */

 

 

/* Possibly free space for some of the args above */

/* Free the space reserved by the linear operator */

/* Note, since the memory allocator is stack based, one must undo */

/* the memory in the correct (opposite) order */

FREE(arg1);

 

/* Free the space reserved by the linear operator */

FREE_LINEAR_OPERATOR(A); 

 

/* Free the linear operator */

FREE(A);

 

 

Checkout szin/lib/actions/construct/conslinop_w.m and look for the CLOVER case.

 

The implementation of LINEAR_OPERATOR can be found in szin/macros/linop.mh . It is a structure with an element for the address of the function, and a void ** pointer for places to hold the memory. The ALLOCATE(A) allocates this structure then the CONSTRUCT_LINEAR_OPERATOR allocates the internal  arrays and assigns values (the values from the arg pointers) to the elements. In the case of a scalar variable passed in, memory is allocated and the value of the scalar variable is copied to this new location. The address of the location is used like other pointers.

 

Macro operations

The currently defined macro operations are

 

 

ALLOCATE:

Memory stack allocate

    ALLOCATE(<arg>)

where <arg> has been defined.

 

 

FREE:

Memory stack free

    FREE(<arg>)

where <arg> has been defined.

 

 

BINARYOP:

Add or subtract two fields and write onto a target

    BINARYOP ( <source1>, <source2>, <target>, <option> )

 

Where <option> :== ADD | SUBTRACT

 

BINOPS:

    <function> ( <source1>, <source2>, <target> )

 

where <function> := MOD, MIN, MAX, DIVIDE, AND, OR, XOR

 

CAST:

Makes explicit the indices of a primitive type or as an overloading hides the indices.

This is really an ugly macro and is sort of a last resort for changing types. It is not meant to

be really fast.

    CAST ( <source>, <target> )

 

Ex.  Matrix(Nc,Nc) -> Array(Nc,Nc)

or    Array(Nc,Nc)  -> Matrix(Nc,Nc)

 

 

 

#+

# CMPLX operations:

#   CMPLX ( <source1>, <source2>, <target>, <option> )

#

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

#-

 

 

#+

# COERCE operations:

#   <function> ( <source>, <target> )

#

# where <function> := FLOAT, INT, BOOL, FLOOR,

#                     CEILING, TRUNCATE, ROUND

#-

 

 

#+

# COMPARE operations:

#   <function> ( <source1>, <source2>, <target> )

#

# where <function> := LT, GT, LE, GE, EQ, NE

#

 

 

#+

# CONJG operations:

#   CONJG ( <source>, <target>, <option> )

#

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

#-

 

CONTRACT:

Contract 2 propagator fields onto a target field.

 

    CONTRACT ( <source1>, <source2>, <target>, <option> )

 

where <option> :== ADD | SUBTRACT | NEGATE | REPLACE

 

The sources and targets must all be propagators but not necessarily of the same lattice type. Effectively, one can use this to construct an anti-quark from a di-quark contraction. In explicit index form, the operation does

 

target^{k' k}_{\alpha\beta} <option>=

   \epsilon^{i j k}\epsilon^{i' j' k'}  source1^{i i'}_{\rho \alpha}  source2^{j j'}_{\rho \beta}

 

and is only appropriate for Nc=3  (or SU(3)).

 

Only

 

 

#+

# COORD operations: COORD ( <source>, <target> )

#

# where <source> := scalar integer indicating

#                   direction (between 0 and Nd-1)

# and   <target> := lattice integer filled with

#                   coord. along <source> dir.

#-

 

 

#+

# COPY operations:

#   COPY ( <source>, <target>, <option> )

#

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

#-

 

 

#+

# COPYMASK operations:

#   COPYMASK ( <source>, <mask>, <target>, <option> )

#

# where <option> := REPLACE | NEGATE | ADD | SUBTRACT

#-

 

 

#+

# FILL operations: FILL ( <target>, <value> )

#

# Ex. <value> can be a lattice scalar of the same

#     primitive type as <target> (a lattice type).

# Or  <value> can be a lattice scalar of a lower

#     primitive type of <target> (a lattice type),

#     so fill a matrix with a diagonal of <value>

# Or  <value> can be a special symbol like

#     ZERO,ONE,RANDOM,GAUSSIAN.  If <target> is a

#     Matrix, then filled with a diagonal matrix.

#     If <target> has Matrix primitive, then

#     RANDOM,GAUSSIAN fill the entire matrix

#-

 

 

#+

# INDEXING operations:

#   INDEXING ( <source1>, <coordinate>, <target> )

#-

 

 

#+

# IO operations: these come in three varieties,

# binary (machine-dependent configuration files),

# NAMELIST (portable machine-readable data files), and

# terminal (human but not machine readable).

#

# CREATE_BINARY(unit_num, "filename"),

# OPEN_BINARY(integer, "filename"),

# WRITE_BINARY, READ_BINARY, CLOSE_BINARY(unit_num)

#

# WRITE_NAMELIST

#

# READ_TERMINAL, WRITE_TERMINAL

#-

 

 

#+

# MULTIPLY

# 

# operations: MULTIPLY ( <source1>, <source2>, <target>, <option> )

# 

# The allowed forms of source1 and source2 are

#   source

#   CONJUGATE(source)

#   COMMUNICATE(source,<cb>,<sgn>,<dir>)

#   CONJUGATE(COMMUNICATE(source,<cb>,<sgn>,<dir>))

#   COMMUNICATE(CONJUGATE(source),<cb>,<sgn>,<dir>)

# 

# If a COMMUNICATE is used, then  <cb>  must be specified with the corresponding

#   sign and direction

#

# If both sources use a COMMUNICATE, then the CB must be the same

# between them. The sgn and dir are allowed to be different.

#

# CONJUGATE or COMMUNICATE is not allowed on the target

#

# The option must always be specified and is

# <option> :== REPLACE | NEGATE | ADD | SUBTRACT

#-

 

 

#+

# NEIGHBOUR_ALL ( <source>, <target>,

#                 <checker board>, <duplex> )

#

# If <duplex> = FORWARD or BACKWARD this is logically

# equivalent to

#

#   FOR <direction> FROM 0 THROUGH Nd-1 DO

#     { NEIGHBOUR ( <source>(<direction>),

#                   <target>(<direction>),

#                   <checker board>, <duplex>, <direction> );

#

# whereas if <duplex> = BOTH it is logically equivalent to

#

#   FOR <direction> FROM 0 THROUGH Nd-1 DO

#     { NEIGHBOUR ( <source>(<direction>,F_SWAP),

#                   <target>(<direction>,F_SWAP),

#                   <checker board>, FORWARD, <direction> );

#       NEIGHBOUR ( <source>(<direction>,B_SWAP),

#                   <target>(<direction>,B_SWAP),

#                   <checker board>, BACKWARD, <direction> ); }

#

# This allows optimized communication by getting the neighbours

# in all directions at the same time (e.g., by using "Stencils"

# on the CM-2). The <checker board> argument is as for

# NEIGHBOUR, but the <source> and <target> must now be

# declared to be of the same PRIMITIVE type as <source>

# but with an ARRAY index of size Nd. For <duplex> = BOTH

# they must also have an array index of size 2.

#

# NEIGHBOUR ( <source>, <target>, <checker board>,

#             <duplex>, <direction>[,<log2 distance>] )

#

# This carries out the operation

#

#   FORALL x DO

#     <target>[x, <checker board>] :=

#         <source>[x' <duplex> <direction>,

#                  1 - <checker board>]

#

# where <direction> stands for a unit vector or index as

# appropriate (the usual abuse of notation), and

# (anti)periodic boundary conditions are implied. <source>

# and <target> must have trivial ARRAY types (i.e., either

# they have no ARRAY dimensions at all, or they are completely

# indexed). Here, x' is the source coordinate after shifting

# by 2^distance given in log2_distance. This means that if

# log2_distance = 0, then only nearest neighbour communication

# is used and for log2_distance > 0, neighbours from a power

# of 2 are used. Note, in this case, then the source and

# target are on the same checkerboard. Arguments <duplex>,

# <checker board>, <direction> and <log2 distance> are all

# simple FORTRAN INTEGER variables, with <checker board> = 0|1,

# <duplex> = FORWARD|BACKWARD, and 0 <= j < Nd.

#

# All WORD, REALITY and PRIMITIVE types are supported.

# Obviously, only LATTICE GRID type is supported

#-

 

 

#+

# PARTOF operations: PARTOF ( <source>, <target>, <function> )

#

# where <function> := REAL_PART_OF and IMAG_PART_OF

#-

 

 

#+

# SEED_TO_FLOAT operations:

#   SEED_TO_FLOAT ( <source>, <target> )

#

# Converts seeds to floating point numbers

#-

 

 

#+

# SLICE_SUMSQ operations:

#   SLICE_SUMSQ ( <source>, <orthog direction>, <target>, <option> )

#   SLICE_SUM ( <source>, <orthog_direction>, <target>, <option> )

#

# where <option> ::= REPLACE | NEGATE | ADD | SUBTRACT

# NOTE: this macro could be generalized to a list or

# orthogonal directions (a plane, a cube, etc)

 

 

#+

#

# SPECIAL operations:

#   SPECIAL ( <source1>, <source2>, ..., <sourceN> )

#

# Provides low level access to the primitive portion of an object.

# Should only be used when absolutely necessary to gain access to

# primitive indices. The macro removes the OUTER site level and

# starts a loop to the closing SPECIAL_END (if there is an outer

# index). A DEFINE_TYPE object is provided that has the primitive indices

# exposed.

#

#    SPECIAL(source)

#

# produces

#  primitive      new object                    info

#  SCALAR    -> _SP_source                   : primitive SCALAR

#  VECTOR(N) -> _SP_source(N)                : primitive SCALAR with array size N

#  MATRIX(N) -> _SP_source(N,M)              : primitive SCALAR with array size N,M

#  TRIANG(N) -> _DIAG_SP_source(N,2)         : primitive SCALAR with array N

#               _OFFD_SP_source(N*(N-1)/2,2) : primitive SCALAR with array N*(N-2/2)

#-

 

 

#+

# Special purpose macros for manipulating spinors

#-

#+

# SPIN_PROJECT ( <source>, <direction>, <PLUS/MINUS>, <target> )

#

# This carries out the operation

#

#   <target> := (1 <+/-> gamma[<direction>]) * <source>

#

# where <source> is of PRIMITIVE type MATRIX[<colour>,Ns]

# and <target> is of PRIMITIVE type MATRIX[<colour>,Ns/2]

# (i.e, COLOUR_SPIN and COLOUR_HALFSPIN respectively).

# <source> and <target> must have REALITY type COMPLEX_64,

# and conformant GRID and WORD types.

#

# The projection operator (1 <+/-> gamma[<direction>])

# is specified by the arguments <+/-> and <direction>

# both of which are simple FORTRAN INTEGER variables,

# with <+/-> = +/-1 and 0 <= j < Nd.

#-

 

 

#+

# SPIN_PRODUCT ( <source>, <Dirac algebra element>, <target> )

#

# This carries out the operation

#

#   <target> := Gamma[<Dirac algebra element>] * <source>

#

# where <source> and <target> are of PRIMITIVE type

# MATRIX[<colour>,Ns] and (i.e, COLOUR_SPIN). <source>

# and <target> must have REALITY type COMPLEX_64, and

# conformant GRID and WORD types.

#

# The argument <Dirac algebra element> is a simple

# FORTRAN INTEGER variable, lying between 0 and 15 inclusive.

#-

 

 

#+

# TRICK ( <source>, <direction>, <PLUS/MINUS>,

#         <target>, <option> )

#

# <source> is of PRIMITIVE type MATRIX[<colour>,Ns/2]

# and <target> is of PRIMITIVE type MATRIX[<colour>,Ns]

# (i.e, COLOUR_HALFSPIN and COLOUR_SPIN respectively).

# <source> and <target> must have REALITY type COMPLEX_64, and

# conformant GRID and WORD types. Arguments <+/-> and

# <direction> are both simple FORTRAN INTEGER variables,

# with <+/-> = +/-1 and 0 <= j < Nd. The last argument

# specifies how <target> is to be modified; the choices are

#

#     <option> :== REPLACE | NEGATE | ADD | SUBTRACT

#

# The TRICK operation makes use of the fact that

# (1 <+/-> gamma[<direction>]) is a projection operator,

# and thus in a suitable spinor basis can be implemented

# using only the top two components of the spinors

# (SPIN_PROJECT) with the bottom two components being

# reconstructed by TRICK:

#

#     <target>[1] = <source>[1]

#     <target>[2] = <source>[2]

#     <target>[3] = a * <source>[1] + b * <source>[2]

#     <target>[4] = c * <source>[1] + d * <source>[4]

#

# with a and b being suitable constants from the set

# {+/-1,+/-i}.

#-

 

 

#+

# SUMSQ operations: SUMSQ ( <source>, <target>, <option> )

#                   SUM ( <source>, <target>, <option> )

#

# where <option> ::= REPLACE | NEGATE | ADD | SUBTRACT

#-

 

 

#+

# Special purpose macros for manipulating SU(N) matrices

#

# SU2_EXTRACT ( <source>, <su2 index>, <b0>, <b1>, <b2>, <b3> )

#

# Compute the b(k) of A_SU(2) = b0 + i sum_k bk sigma_k */

#

#

# SUN_FILL ( <b0>, <b1>, <b2>, <b3>, <su2 index>, <target> )

#

# Fill in B from B_SU(2) = b0 + i sum_k bk sigma_k

#-

 

 

#+

# TRACE operations: TRACE ( <source>, <target>, <option> )

#

# where <option> ::=

#   REAL_PART | IMAGINARY_PART | COMPLEX_PART

#-

 

 

#+

# TRACE_MULTIPLY operations:

#   TRACE_MULTIPLY ( <source1>, <source2>, <target>, <option> )

#

# where <option> ::= REAL_PART | IMAGINARY_PART | COMPLEX_PART

#-

 

 

#+

# UNARY operations: <function> ( <source>, <target> )

#

# where <function> := ABS, EXP, COS, SIN, LOG, ACOS,

#                     SQRT, ASIN, TAN, ATAN, NOT, SIGN

#-

 

Datatype structure

The basic type declaration in SZIN is DEFINE_TYPE, which defines a hierarchical type. Its general form is

DEFINE_TYPE(<name>, <word type>, <inner grid>, <reality>, <primitive>, <outer grid>, <array index>, ...)

and a type declaration of the form

LATTICE_COMPLEX_COLOUR_COLOUR_CB_DIRECTION(`U')

just expands into

DEFINE_TYPE(`U',`REAL_32',`ILATTICE',`COMPLEX_64', `MATRIX(`Nc',`Nc')',`SCALAR',`2',`Nd')

for a vector machine and

DEFINE_TYPE(`U',`REAL_32',`SCALAR',`COMPLEX_64', `MATRIX(`Nc',`Nc')',`OLATTICE',`2',`Nd')

for the Columbia Machine or the MIT SMP machine. The possible arguments for DEFINE_TYPE are

 

Word types:

      UNSIGNED_32

      INTEGER_32

      REAL_32

      REAL_64

      BOOLEAN

    

Inner grid types:

      SCALAR

      ILATTICE

 

Reality types:

      SCALAR

      COMPLEX_64

 

Primitive types:

      SCALAR

      SEED

      TRIANG(<dimension>)

      VECTOR(<dimension>)

      MATRIX(<dimension 1>, <dimension 2>)

      PRODUCT(<TYPE 1>, <TYPE 2>)

   

Outer grid types:

      SCALAR

      OLATTICE

   

Array types: Up to 5 (This is easily extended) array indices are allowed, they are 0-based, and the arguments to DEFINE_TYPE are their dimensions.

 

The basic type structure is described by a fiber-bundle language, that is the bundle is the base manifold (the lattice) and at a site there is a fiber with some type. Mathematically, the primitive types we want at a site are either simple scalar values, or live within the product space of a vector space over color components with a vector space over spin components. This is encapsulated in the PRODUCT primitive type above where TYPE 1 and TYPE 2 are composed of the basic types of above the PRODUCT. So, we want generically

·        Scalars:            Scalar

·        Gammas:            Product(Scalar,Matrix(Ns))

·        Propagators:            Product(Matrix(Nc),Matrix(Ns))

 

where Nc indicates the dimension of the color vector space and Ns indicates the dimension of the spin vector space. For example, Scalars can multiply gauge fields or fermions. A gamma matrix can multiply a fermion, but not a gauge field. A gauge field can multiply a fermion, but a fermion cannot multiply another fermion unless there is a conjugation involved.

For example, a fermion can be laid out in memory as a Nc x Ns matrix, but mathematically a fermion is not equivalent to a matrix.

 

Example

/* @(#)polylp.m        11.2 (8/22/96) */

/* POLYLP -- calculate the global normalized sum of the Polyakov loop */

/* u -- gauge field (Read) */

/* poly_loop -- Polyakov loop average in direction mu (Write) */

/* mu  -- direction of Polyakov loop (Read) */

 

include(types.mh)

 

SUBROUTINE(polylp, u, poly_loop, mu)

 

LATTICE_COMPLEX_COLOUR_COLOUR_CB_DIRECTION(u);

DCOMPLEX(poly_loop);

INTEGER(mu);

{                           /* Local Variables */

  include(COMMON_DECLARATIONS);

 

  LATTICE_COMPLEX_COLOUR_COLOUR(u_dble);

  LATTICE_COMPLEX_COLOUR_COLOUR(poly);

  LATTICE_COMPLEX_COLOUR_COLOUR(tmp_1);

  LATTICE_COMPLEX(poly_trace);

  INTEGER(cb);

  INTEGER(n);

  DOUBLE(dummy);

 

  START_CODE;

 

  ALLOCATE(u_dble);

  ALLOCATE(poly);

  ALLOCATE(tmp_1);

  ALLOCATE(poly_trace);

 

  cb = 0;                   /* Start from even checkerboard */

 

  /* Construct double links */

  MULTIPLY(u(cb,mu), COMMUNICATE(u(1-cb,mu),cb,FORWARD,mu), u_dble, REPLACE);

 

  /* Multiply together to get the Polyakov loop */

  COPY(u_dble, poly, REPLACE);

 

  for(n = 2; n <= VALUE(nrow(mu))/2; ++n) /* runs over half the linear size */

  {   

    NEIGHBOUR(poly, tmp_1, cb, FORWARD, mu, 1);

    MULTIPLY(u_dble, tmp_1, poly, REPLACE);

  }

 

  /* Take the trace and sum up */

  TRACE(poly, poly_trace, COMPLEX_PART);

  SUM(poly_trace, poly_loop, REPLACE);

 

  FREE(poly_trace);

  FREE(tmp_1);

  FREE(poly);

  FREE(u_dble);

 

  dummy = TO_DOUBLE(1) / TO_DOUBLE(Nc*vol_cb);

  MULTIPLY(poly_loop, dummy, poly_loop, REPLACE);

 

  END_CODE;

}

 

 

Getting the source via CVS

The source code is contained in a CVS repository at JLab. Those with an account at JLab can get the code with a CVSROOT variable as below

 

setenv CVSROOT  :ext:jlabs1.jlab.org:/home/edwards/cvsroot

setenv CVS_RSH ssh

 

cvs checkout szin

 

The CVS home site with documentation is     http://www.cvshome.org

 

there is a useful tutorial/on-line book at http://cvsbook.red-bean.com/cvsbook.html

 

The code is also available by anonymous “pserver” access. NOTE, I’m trying to get this to work. The source code is also available by a tarball on the web site.

Rather than having to wipe out whatever directory your directory to update to a new tarball, use CVS instead. This is called tracking a 3rd party source.

 

1) Go into your home directory (or wherever) and do a

% cvs init

 

this will create a   ~/cvsroot and ~/cvsroot/CVSROOT . This

is the location of your repository.

 

2) Get the tar from the web site and untar it in say /tmp. Then

% cd /tmp/szin

% cvs import szin RGE szin13-6   # the szin is your module name

                                 # RGE is the vendor name (something unique)

                                 # szin13-6 is the release tag. Actually you

                                 # make up all of these - nowhere are you

                                 # getting these names from me

% rm -rf /tmp/szin

% cd <whatever work directory you want>

% setenv CVSROOT  ~/cvsroot

% cvs checkout szin

% cd szin   # this is now your working directory

 

3) Say some time later you want to update to a newer version,

not necessarily the next release. Get the tar file and untar it

in say /tmp

% cd /tmp/szin

% cvs import szin RGE szin13-9

% rm -rf /tmp/szin

% cd <workdir>/szin

% cvs update

% <use all your favorite cvs commands>

 

Note this will now track changes. If you edit and commit some changes, then if later I also insert changes, you will merge then when you do the next import.

 

Note, with sufficient permissions on the repository files (like group access), anybody else there can also update your cvs source files.

 

Also note, the down-side of this method is you don't have the real repository info - you can't see the logs, etc. of why somebody here changed something.

 

Running the program under MPI

For compilation with MPI, the flags in the Makefile.<machine_type>  probably will need to be changed. At JLab, the MPI and LIBS flag are

 

MPI = /usr/local/mpich-gm

 

LIBS = -L$(MPI)/lib -L/usr/local/gm/binary/lib -L$(SMPDIR) -lmpich -lgm

-lQCDutils -lpthread -lots -lm

 

A sample configuration for MPI over Myrinet’s GM is

2

hpci3a 2

hpci2a 2

 

To run the program at JLab, do

/usr/local/mpich-gm/bin/mpirun.ch_gm [-np #] --gm-v --gm-f gm.conf ../../main/tests/kxtsmp_t_mesplq_g -n 1

Privacy and Security Notice