Tutorial 1: Using and Programming QDP++ and Chroma

Introduction

This tutorial takes you through the basics of programming with Chroma and linking against the Chroma library. It also shows you how we deal with XML in code.

Getting the chroma package

The chroma package can be obtained from the chroma website .  There are various ways of getting the packages, for example as a “tar-ball” or using CVS; we have downloaded both chroma and qdp++ in the SciDAC/ subdirectory of your account. 

Using a version of chroma you have just built

If you have just built and installed chroma yourself, you will have different source, build and installation directories. We will refer to the source directory as the place where the sources are and the install directory as the place where the executables are. The build directory is the place where you built chroma (the directory where you type make and make install). When you build your own codes, the executables get built in the

mainprogs/main

and

mainprogs/tests

subdirectories of the build directory tree. The chroma executable lives in mainprogs/main/.

If you have already typed make install you will find all the executables copied to prefix/bin. Where prefix is the installation directory you specified with the --prefix= configure option. If you didn't explicitly specify a prefix directory it defaults to /usr/local/

Using the pre-compiled version

Building chroma is quite slow, and therefore we have a pre-compiled version that is installed in /opt/scidac.  The executables are in the bin/ subdirectory, the include files are in the include directory, whilst the chroma library is in the lib subdirectory.  In this tutorial, the emphasis will be on employing chroma as a library, in a “main” program written by you; there is also a chroma executable, together with various others, that can perform various inline measurements, and we will touch on this other use later.

DOWN TO WORK

Setting up a working directory

Make a directory to work in. Let's call it tut1. You should then cd into it.

Getting the files

Download the files for this tutorial by using from this link

Unpack the files with the following command:

gunzip tut1_files.tar.gz
tar xvf tut1_files.tar

Now enter the working directory by typing:

cd Tutorial3

You should find several files in this directory:

input.xml

The input XML file for this tutorial

propagator_0

A test propagator for this tutorial

test_purgaug.cfg1

A V=4x4x4x8 test lattice – included with chroma

Makefile

A Makefile to build the tutorial main program

Tut1.cc

A skeleton Chroma application we will use during this tutorial

Makefile

First let us look at (and edit) the Makefile. Bring it up in emacs

  • The first line defines a Makefile variable called CHROMA, containing the path to the chroma installation. We need to fix this so that it points to the installation on tutorial machines. You need to substitute the your chroma installation directory here. If you used the prebuilt packages this should be:

/opt/scidac/

  • The second line sets the location of the chroma-config program. This is based below the path defined in the CHROMA variable so you don't need to change it

·        The next four lines define Makefile variables:

    • CXX -- The C++ compiler used to compile chroma
    • CXXFLAGS -- The compiler flags to compile chroma
    • LDFLAGS -- Additional linker flags to link chroma executables
    • LIBS -- external libraries that need to be linked with Chroma

In the case of the CXXFLAGS we put also the option -I. instructing the compiler to look in the local directories for header files too. In this tutorial we don't have extra header files so this is a little redundant.

Note that these variables are set using commands of the form:

$(shell $(CONFIG) --cxx)

This instructs make to execute the command line following the string shell. In this case it will run the chroma-config program with various flags to determine characteristics of the installation needed to work with Chroma. If you change the CHROMA variable, this Makefile should work even on the QCDOC as long as you use GNU Make.

You can use this command yourself without putting it in a Makefile. Try typing:

/opt/scidac/bin/chroma-config

to get a list of options. Now try to use some of them:

/opt/scidac/bin/chroma-config --cxx

The version that is installed is for a scalar build, without any communication.  Were you to build this for, say, the QCDOC, or the Cray XT3/4, the libraries, compiler options etc would be much more extensive.

  • We have an empty HDRS variables, since we don't have extra headers.
  • We have a variable called OBJS that defines the object files used in our exercise. Currently we have only one object file: tut1.o which will be created from the source file tut1.cc.

·        We have the first rule of our makefile:

tut1: $(OBJS)
   $(CXX) -o $@ $(CXXFLAGS) $(OBJS) $(LDFLAGS) $(LIBS)

Hidden at the start of the second line is a TAB character, not spaces

This rule tells the makefile that we make target tut1 which depends on the files listed in the OBJS variable, by calling the command in the CXX macro with the following arguments. The $@ character is expanded to the name of the target, and the other variables are also expanded to their earlier definitions

Basically, this rule tells make, how to build our test executable

·        We have a rule to make .o object files from .cc source files:

%.o: %.cc $(HDRS)
   $(CXX) $(CXXFLAGS) -c $<

Here the $< character specifies the input file, and as with all make rule, the processing instruction has a TAB character in front of it.

  • Finally we have a rule for cleaning up. This is the rule that gets executed when you type make clean

So this makefile, will first try to make tut3.o as it is a dependency of the tut3 target. It uses the implicit rule to make .o files from .cc files. Finally, it creates the executable using the rule for tut3. When you type make without arguments, the first target will be processed which in this case is to make tut3.

Compiling and Running the Tutorial

Once you have changed the CHROMA macro in line one, you can just type make to build the executable. You should see output similar to

g++ -I/usr/local/share/chroma-3.22.10/scalar/include -msse2 -msse -O2 -finline-limit=50000 -march=pentium4 -I/usr/local/share/qdp++-1.23.2/scalar/include -I/usr/include/libxml2 -I. -c tut1.cc

In file included from /usr/local/share/chroma-3.22.10/scalar/include/chromabase.h:15,

                 from /usr/local/share/chroma-3.22.10/scalar/include/chroma.h:29,

                 from tut1.cc:6:

/usr/local/share/qdp++-1.23.2/scalar/include/qdp.h:168:2: warning: #warning "Using scalar architecture"

g++ -o tut1 -I/usr/local/share/chroma-3.22.10/scalar/include -msse2 -msse -O2 -finline-limit=50000 -march=pentium4 -I/usr/local/share/qdp++-1.23.2/scalar/include -I/usr/include/libxml2 -I. tut1.o -L/usr/local/share/chroma-3.22.10/scalar/lib -L/usr/local/share/qdp++-1.23.2/scalar/lib -L/usr/lib -lchroma -llevel3 -lqdp -lXPathReader -lxmlWriter -lqio -llime -L/usr/lib -lxml2 -lz -liconv -lm -lgmp -lgmp

Info: resolving _xmlFree by linking to __imp__xmlFree (auto-import)

Info: resolving _xmlIndentTreeOutput by linking to __imp__xmlIndentTreeOutput (auto-import)

 

You can now run the executable like you do with the stock applications:

./tut1 -i input.xml -o output.xml

and you should see the following output:

Hello World
Finished init of RNG
Finished lattice layout
Starting up unit gauge (free) config
Single Precision Read
QIO_read_finished

and an output.xml file should have appeared in your local directory which we will inspect later.

 

This exercise shows, firstly, how to compile a chroma program using a Makefile, and secondly that the actions of chroma programs are controlled using XML, such as input.xml.  Indeed, it is possible to do much of the physics you want simply by using the chroma executable, with its actions controlled by the XML files.

Hacking Chroma

Bring the tut1.cc file up in emacs and let us look through it. Lines beginning with // or text between /* and */ are comments.

First we include the file chroma.h which contains many definitions. Its location is given to the compiler by the -I flags given by chroma-config --cxxflags.

We tell the compiler that we are using the Chroma namespace. This is there so that other code can use the same names we do but in their own namespace

At this point it becomes useful to follow the comments in the code, and I'll just give a cursory overview of what is happening in the code. If you have done any C/C++ work this should be easy.

On line 16 we define a structure called Param_t which will be used to hold parameters. Currently the only parameter is

multi1d<int> nrow;

The multi1d is a C++ templated type for 1d arrays. It can have its size reset. The template parameter int between the < and > signifies that this will be an array of integers. It will be called nrow which is the conventional name for an array that holds the lattice size.

On line 23, we define a struct called Prop_t which will hold in itself a single string, the name of the propagator file. This string will be called prop_file and has C++ type std::string (or string from the std namespace).

On line 30 we define the complete input structure for the code called App_input_t. This consists of the previous Param_t, Prop_t structs and a struct called Cfg_t which is defined already in the chroma library.

Reading XML

We need to define XML reading routines for the structs we have just defined. This is done in the functions on lines 39, 47 and 56. Note that the functions all have the same name: read. The first two inputs are the same in all three cases but the third argument is different. This is a technique called function overloading in C++.

Note that in the read routines, we specify as the first argument an object of type XMLReader& and const string& path. The XMLReader& is a reference to an XML document container, which allows us to read from the document. The path is an XPath expression. Consider the following XML snippet:

<?xml version="1.0"?>
<foo>
  <bar>
    <fred> 5 </fred>
  </bar>
</foo>

then we can select the value 5 using the Xpath expression:

/foo/bar/fred

We could do this in Chroma with the commands:

int integer;
read(xml_reader, "/foo/bar/fred", integer);

Where the xml_reader object would be of type XMLReader and should contain the XML document in question. Because the reader function is declared as:

void read(XMLReader& reader, const std::string& path, int& i);

the &s mean that the arguments are passed by reference and are potentially modifiable (with the exception of the path which is declared const ie immutable).

We used the Xpath expression /foo/bar/fred. This is an absolute path. We can also do the equivalend of cd-ing in to the bar tag, and perform relative queries from there:

XMLReader reader2(xml_reader, "/foo/bar");
int integer;
read(reader2,"fred",integer);

This technique is useful if a particular tag has several subtags of interest, and also to process individual groups. We use this trick on lines 41, 49 and 58 of our example code, to select the groups of interest for reading into individual structs.

Our technique, in lieu of data binding, is to build up recursive read functions using relative queries. This can be seen in around line 56, where we have just 3 reads to read in each part of the App_input_t structure, using the readers for the parts that we have just defined and are already defined in the chroma library

Exceptions

Occasionally things we may try and do are not possible. At times like this we need to interrupt what we are doing to report the error condition. This is done in C++ with the mechanism of exceptions. For example if we cannot open a file, or cannot match an XPath expression, we "throw" an exception. This exception can be caught by the program and handled gracefully. If it is not caught, then generally the program aborts.

For example, if the read in line 50 fails an exception will occur. This function does not handle the exception, so it will propagate up to the read defined on line 56, which is the only one to call the original read. We can see beginning on line 61 the following construct:

try {
// Do stuff
}
catch( const string& e)
{
// Do this stuff if a string exception occurs

}

What happens here is that the routines in the try block are executed, and if any of them throw an exception of type string that exception will be caught, and an appropriate error will be displayed. The XMLReaders only throw exceptions of type string. The strings contain error messages.

There can be other types of exceptions eg: file opening exceptions, casting exceptions, memory allocation exceptions that are more than just strings. Generally for each different type of exception you need a separate catch block. However there is a catch all case in which will catch all exceptions. An example of this is given on line 102

catch(...) // Catch all types of exceptions
{
// Handle exceptions here
}

Main Program, Initialization and Ending

Once we have defined our XML readers, we get on to the main program proper on line 83, declared as:

int main(int argc, char **argv)

which is a standard declaration that should be familiar to you if you have ever used C. The argc is the number of command line arguments, and the argv is an array of C strings (char*s) holding the actual arguments.

On line 86 we have

Chroma::initialize(&argc,&argv)

which initialises QDP++ for us, and also grabs the input and output filenames out of the command line arguments.

Paired with this is a

Chroma::finalize()

at the end of the program.

Near line 89, there is a function call START_CODE() paired with a function END_CODE() near the end. These functions invoke the profiling functionality of QDP++ if that has been enabled. You are encouraged to put START_CODE() and END_CODE() calls in later functions that you may write yourself.

The rest of the code

Lines 90-110 open our XML reader to read the input XML for the application.

·  We instantiate the App_input_t struct

·  We instantiate the XMLReader on line 98

·  We try to open the user specified input file or DATA between 99-107 catching errors as we go. Note the use of Chroma::getXMLInputFileName() -- read the comments for explanation

·  We write Hello World to the terminal. Note we don't use the standard cout but rather the one defined in the QDPIO namespace:

QDPIO::cout << "Hello World" << endl;

The endl asks for a newline character. Using this special QDPIO::cout on a parallel machine will result only in 1 processor writing to the terminal.

·  Around line 114, reads the app input structure from root tag <tutorial2b>.

·  Lines 118 and 122 initialise the lattice layout from the XML that has been parsed into the App_input_t struct input making use of the nrow parameters:

Layout::setLattSize(input.param.nrow);

// Initialise
Layout::create();

This is something that always needs to be done before you want to do anythig lattice-y.

·  Lines 131-141 read the gauge configuration specified in the <Cfg> tag of the input XML file. We now commonly use XML headers in our configuration files. Further, if using the SciDAC format, there are also file related XML headers too. The gaugeStartup function returns these in the readers gauge_file_xml and gauge_xml. The actual gauge field has type:

multi1d<LatticeColorMatrix> u(Nd);

In other words it is a 1D array (with Nd elements) of LatticeColorMatrix objects. Each array element represents the SU(3) link fields in one of the Nd directions.

Finally the unitarity of the field is checked in line 141.

·  lines 150-161 read a SciDAC LIME format propagator from the disk. This is similar to reading the gauge fields, but with different function calls.

·  There is some amount of XML writing following all this, we will come back to it in the next section

·  On line 199 we measure the plaquette and link trace, after which there is room for you to code

Writing XML

QDP++ provides Chroma with three ways of writing XML:

  • XMLFileWriter -- These objects write to files
  • XMLBufferWriter -- These objects write to memory - ie to strings
  • XMLArrayWriter -- These can be used to write sequences seperated by <elem> tags

The writers support two kinds of operations:

·  push() and pop() operations. These open up and close group XML tags respectively. So

XMLFileWriter foo("my_file.xml"); // Opens a new XML file called my_file.xml
push(foo,"Root");
pop(foo);

will open an XML file. The push will open a tag: <Root> and the pop will close it by writing </Root>.

·  write() -- These operations will write data to the XML file. The instruction

write(foo, "Barf", 5);

will write into the file controlled by the writer foo the following XML:

<Barf>5</Barf>

The operations can be combined to produce fully fledged XML documents:

XMLFileWriter foo("my_file.xml"); // Opens a new XML file called my_file.xml
push(foo,"Root");
write(foo, "Barf", 5);
pop(foo);

should produce:

<?xml version="1.0" ?>
<Root>
  <Barf>5</Barf>
</Root>

in the file my_file.xml.

Additionally, you can use XML writers to spew back the contents of readers. This will strip the readers of their XML headers.

XMLReader in("file.xml");
XMLFileWriter out("outfile.xml");
push(out, "Root");
write(out, "Fred", in);
pop(out);

should spew the contents of file.xml into outfile.xml with a root tag of Root and surrounded by tags <Fred>   </Fred>.

Some functions write their own tags, and do not need surrounding push and pop functions examples of this are:

  • proginfo() on line 174
  • MesPlq() on line 199

Sometimes the underlying file stream of an XMLFileWriter is buffered, so although you write to it, it may not immediately appear in the resulting file. If at this point your code crashes, it may not get written out at all. You can explicitly flush the streams associated with a writer by calling their member function flush. This may help you track more easily where crash-es have occurred, or make output from long calculations more "regular". You can see examples of this on lines 195 and 200 of the example program tut1.cc

At this point we have covered everything in the tut1.cc file and it is time for a few exercises

Exercise 1: XML Processing

Look at the output file we produced earlier: output.xml. Correlate the tags therein with the push(), pop, write, proginfo, MesPlq functions.

Now look at the input file input.xml, and correlate the tags there, with the reader functions and XPath expressions

Exercise 2: More XML Processing

Make the read function for the Param_t struct read in your name from tags <my_name> and </my_name>, and write it out.

You will need to add a member to the Param_t struct near line 18. This should be of type std::string. You will need to add a line near line 50, to read the new tag. In addition, you will need to add the tag to the input.xml file.

Now remake the program and run it, and check both for your name in the output XML, and to the screen.

Exercise 3: Reading the gauge configuration

Look for the tag <Cfg>: you will see that it specifies a unit gauge configuration, and the subsequent filename is then ignored.  Change the <cfg_type> to SZIN, and the name to “test_purgaug.cfg”.  Now rerun the code; now the plaquette-measurement routine will return non-unit values for the plaquettes; this is the configuration used to generate the quark propagator we shall employ below.

Exercise 4: Computing the pion correlator

Find the part of the program marked out with comments:

// Do something exciting here yourself.
// Suggested exercise: Compute the zero mom pion on the propagator you
// have read.

We will write our code below this. You should add in the code, into the file as we go along.

The program will have read for you a propagator into an object called quark_propagator. One thing we can do with this is to use it to compute the pseudoscalar operator:

C(t) = sum_{x} Tr gamma_5 \bar{G}(0,x) gamma_5 G(0,x)

where G(0,x) is the quark propagator and \bar{G} is the antiquark propagator. While it is straightforward to optimise the pion to be just the norm of the propagator, lets work with this form, it leaves room for more generalisation.

Let us form the antiquark propagator:

\bar{G}(0,x) = gamma_5 G(0,x) gamma_5

we can do this in chroma as follows:

LatticePropagator anti_quark = Gamma(15)*quark_propagator*Gamma(15);

Now we need to trace this with the original propagator inserting gamma_5 matrices. We can do this in Chroma with the line:

LatticeComplex traced_props = trace( Gamma(15)*adj(anti_quark)*Gamma(15)*quark_propagator);

 

The Gamma(15) is probably rather mysterious; Chroma encodes the basic 16 gamma matrices as \gamma_1^{n1} \gamma_2^{n2} \gamma_3^{n3} \gamma_4^{n4} as M = 0^{n1} + 2^{n2} + 4^{n3} + 8{n4}, where all of the n’s are 0 or 1.  Thus \gamma_5 corresponds to Gamma(15).

Now we need to sum this over spatial sites only, to give us a correlation function. Usually we also need to convolve the correlation function with momentum phase factors. Chroma has special technology to do this an object called SftMom which stands for SlowFourier TransformMomenta. This object can precompute the momentum phase factors, and use them to Fourier transform correlation functions like traced_props. Let us create such an object with the phases for ZERO momentum. In Chroma this is accomplished by

SftMom phases(0, true, Nd-1);

This creates the momentum phases, with a maximum mom^2 of zero. The true argument asks for equivalent momenta to be averaged, which is not relevant in this case. The Nd-1 nominates direction Nd-1=3 as the time direction (remember these are indexed from 0).

Let us now fourier transform our propagator and sum it over space:

multi2d<DComplex> hsum; // A 2D array of double prec complex numbers
// One dimension is indexed by momenta
// The other is the timeslice

hsum = phases.sft(traced_props); // Apply the fourier transform

The array hsum is indexed as:

hsum[ momentum_index ][ t ];

with the number of momenta determined from the SftMom class by its member function numMom(). Further we can convert the momentum index to the actual 3-vector momentum, using the member function numToMom(). Let us use these to write out our correlator into an XML file. We will use an XMLArrayWriter to write this out.

// Create an XMLArrayWriter to write out am array of numMom() elements
XMLArrayWriter momenta(xml_out, phases.numMom());

push(momenta, "PseudoScalar"); // Array will be called PseudoScalar

// loop over all the momenta
for(int i=0; i < phases.numMom(); i++) {

  // Special XMLArrayWriter instruction to start a new array element
  push(momenta);

  // Write out the momentum index
  write(momenta, "mom_index", i);

  // Write out the 3 momentum corresponding to this index
  write(momenta, "mom", phases.numToMom(i));

  // Write correlator
  write(momenta, "correlator", hsum[i]);

  // We are done with this element
  pop(momenta);
}

pop(momenta); // Pop the toplevel tag

If you have been adding all this code as we went along you should now save your program, and run make to remake it. Now run it with (with input and output XML files) and look at the output XML file. It should have at its end the following snippet:

<PseudoScalar>
<elem>
   <mom_index>0</mom_index>
   <mom>0 0 0</mom>
   <correlator>
    <elem>
     <re>0.147564</re>
     <im>0</im>
    </elem>
    <elem>
     <re>0.00923146</re>
     <im>0</im>
    </elem>
    <elem>
     <re>0.0028872</re>
     <im>0</im>
    </elem>
    <elem>
     <re>0.00109399</re>
     <im>0</im>
    </elem>
    <elem>
     <re>0.000683482</re>
     <im>0</im>
    </elem>
    <elem>
     <re>0.00110199</re>
     <im>0</im>
    </elem>
    <elem>
     <re>0.00286503</re>
     <im>0</im>
    </elem>
    <elem>
     <re>0.00919643</re>
     <im>0</im>
    </elem>
   </correlator>
</elem>
</PseudoScalar>

Example 5: Shifts

An important feature of Chroma/QDP++ is that the underlying communication is hidden from the user.  Furthermore, the shift is “intelligent”, and qdp++ knows how to apply shifts to the basic types such as colour matrices, propagators, lattice complexes, etc..  To see how this works, try to recompute the pion correlator as above, but shifted in the +i direction, where i is a spatial index.  Do this two ways

1.     Shift both the quark and antiquark fields in the +i direction, contract them, and then perform the time-sliced sum

2.     Contract the quark and antiquark fields, contract them, shift and perform the time-sliced sum

Comparing with the earlier (unshifted) calculation, do your results exhibit the expected behaviour?  If you’re feeling adventurous, try looking at some other correlators, including cross-correlators.  In particular, with the given propagator, we can use both the \psibar \gamma_i \psi and \psibar D_i \psi for the interpolating operator at the sink – try to compute both of these, and see that the cross-correlator with a pseudoscalar at the source is indeed zero

Fin

You should now be able to read a lot of the measurement code, understand basic XML manipulation, write your own readers/writers, measurements and link against an installed library