################################################################################ # README: General Information for Building and Installing the JAVA PBS Utilties # # Walt Akers - 2002 # Thomas Jefferson National Accelerator Facility ################################################################################ Step 1) PBSClassLib Most of these java classes rely on the PBS utility library that is part of Jefferson Lab's UnderLord project. Download the newest version of the UnderLord Scheduler from the following web site: http://www.jlab.org/hpc/UnderLord Install and build the UnderLord system in a common directory. The java directory containing this distribution should be installed directly under the underlord directory... for example: /usr/local/underlord-1.7/java Step 2) MySQL A) All of these utilities rely heavily on MySQL being installed... Once you have installed MySQL you'll need to create the data tables used by these utilities using the following mySQL commands. ------------------------------------------------------------------------- CREATE DATABASE PBS; USE PBS; CREATE TABLE Jobs ( Job CHAR(50) NOT NULL, Cluster CHAR(50), RecordTime DATETIME, CreationTime DATETIME, QueueTime DATETIME, EligibleTime DATETIME, StartTime DATETIME, EndTime DATETIME, User CHAR(50), Grp CHAR(50), JobName CHAR(50), Queue CHAR(50), ArchReq CHAR(50), CpuTimeReq INT UNSIGNED, WallTimeReq INT UNSIGNED, FileSizeReq INT UNSIGNED, NodeCntReq INT UNSIGNED, ProcCntReq INT UNSIGNED, NodesReq CHAR(255), Nodes LONGTEXT, ExitCode INT, CpuTimeUsed INT UNSIGNED, WallTimeUsed INT UNSIGNED, MemUsed INT UNSIGNED, VMemUsed INT UNSIGNED, PRIMARY KEY (Job) ); CREATE TABLE Clusters ( Cluster CHAR(50) NOT NULL, Description LONGTEXT, PRIMARY KEY (Cluster) ); CREATE TABLE Nodes ( Node CHAR(50) NOT NULL, StartDate DATETIME DEFAULT NULL, EndDate DATETIME DEFAULT NULL, Cluster CHAR(50), Active BOOL DEFAULT 'TRUE', Type ENUM ('Cluster', 'Time-Shared') DEFAULT 'Cluster', ProcessorCnt INT UNSIGNED DEFAULT '0', Keywords CHAR(255), PRIMARY KEY (Node) ); CREATE TABLE NodeAssignments ( Job CHAR(50) NOT NULL, Node CHAR(50) NOT NULL, ProcessorCnt INT UNSIGNED, PRIMARY KEY (Job, Node) ); CREATE TABLE NodeStates ( Node CHAR(50) NOT NULL, StartTime DATETIME DEFAULT 'NOW' NOT NULL, State ENUM ('Online', 'Reserved', 'Offline', 'Down', 'Unknown') DEFAULT 'Unknown' NOT NULL, PRIMARY KEY (Node, StartTime) ); GRANT ALL PRIVILEGES ON PBS.* TO (Your Name)@localhost IDENTIFIED BY (Your MySQL Password) WITH GRANT OPTION; ------------------------------------------------------------------------- B) From within mySQL, add an entry for each PBS Server into the Clusters table. The mySQL command for server 'hpcfs1' will look like this: INSERT INTO Clusters SET Cluster="hpcfs1", Description="PBS Server hpcfs1" Step 3) Environment Variables A) Define TOMCAT_HOME to point to the tomcat home directory Step 4) Site Specific Modifications A) Edit PBSDatabase/Jobs/SQLAccounting.java On line 112 change the host in the dbURL variable to identify your mySQL host On line 113 insert your database username for the dbUsername value On line 114 enter your database password for the dbPassword value B) Edit PBSDatabase/Nodes/PBS_NodeTable.java On line 335 change the host in the dbURL variable to identify your mySQL host On line 336 insert your database username for the dbUsername value On line 337 enter your database password for the dbPassword value C) Edit packages/PBSUtilization/SQLUtil.java On line 151 change the host in the dbURL variable to identify your mySQL host On line 152 insert your database username for the dbUsername value On line 153 enter your database password for the dbPassword value D) Create the file packages/util/hosts.cfg containing the name of the PBS Server whose utilization will be charted. If you will be charting the utilization of more than one server, then add each host on a line by itself. Step 5) Build and install everything on your webserver host A) In the underlord directory, as root, 'make all install' B) In the underlord/java/PBSDatabase directory, as root, 'make all install' C) In the underlord/java/packages directory, as root, 'make all install' Step 6) Start the accounting applications These programs will monitor the PBS nodes and jobs and will regularly update the database to reflect the current statistics. They will require a CLASSPATH containing mm.mysql.jdbc-1.2c and the PBSInterface.jar. Also, since this program uses a native library, the environment variable LD_LIBRARY_PATH should be set to indicate the directory containing the library libPBSInterface.so A) Start SQLAccounting on the PBS Server host, as root, by typing: java SQLAccounting Typically in PBS, the path to the accounting files will be /usr/spool/PBS/server_priv/accounting. If you are running the SQLAccounting program on a remote system, then you will need to NFS mount the PBS Server's accounting files to a location where the they can be read by this application. If the program is monitoring the activity of more than one PBS Server, then multiple paths can be passed on the command line. B) Start PBS_NodeAccounting as root, by typing: java PBS_NodeAccounting Step 7) Update the $(TOMCAT_HOME)/conf/server.xml file A) Add a line to the server.xml file telling it about the /util servlets. The line might look like this... Step 8) Update the $(APACHE_HOME)/conf/httpd.conf file A) Add a line to the httpd.conf file telling it about the /util servlets. The line might look like this... WebAppDeploy util warpConnection /util/ Finally) If all goes well: The web application: http://webserver/util/servlet/util.GetUtil should provide a graph of weekly, monthly and yearly system utilization. The web application: http://webserver/util/servlet/util.GetUserUtil?user=jones should provid a graph of user jones' weekly, monthly and yearly utilization. Troubleshooting) 1) Always look to the database first A) Is mySQL installed B) Can you log into mySQL using the username and password that you defined earlier. C) Once in mySQL, test for the existance of the PBS database by typing: use PBS; D) Does the mySQL command "SHOW COLUMNS FROM Jobs;" show the same structure defined in Step 2. E) Does the command "SHOW COLUMNS FROM Clusters;" show the same structure defined in Step 2. F) Does the command "SHOW COLUMNS FROM Nodes;" show the same structure defined in Step 2. G) Does the command "SHOW COLUMNS FROM NodeAssignments;" show the same structure defined in Step 2. H) Does the command "SHOW COLUMNS FROM NodeStates;" show the same structure defined in Step 2. I) Does the command "SELECT * FROM Clusters;" return a record (or records) with the names of the PBS Servers. If not, you'll need to add a record for each PBS Server into the Clusters table. J) Does the command "SELECT * FROM Nodes;" return a record for each compute node in the system... and is the information consistent with the node configuration. If not, then the PBS_NodeAccounting application is not updating the database... check the dbURL, dbUsername and dbPassword used by this application and make sure that the program is running. H) Run the command "SELECT * FROM NodeStates;" There should be an entry identifying the state of each node in the system. If not, then check the PBS_NodeAccounting application. I) Run the command "SELECT * FROM Jobs;" A list of all jobs (both current and historic) should be displayed. If no jobs are listed (and jobs have indeed run on the system), then the SQLAccounting program is at fault. Make sure SQLAccounting is running Make sure the dbURL, dbUsername, and dbPassword are correct Make sure the PBS accounting directory is accessible Make sure this directory was passed as an argument to SQLAccounting J) Run the command "SELECT * FROM NodeAssignments;" A list of jobs and the nodes that they were assigned to should be displayed. If this data is not present (and jobs have indeed run on the system), then check the SQLAccounting program. 2) Calling the servlet http://webserver/util/servlet/util.GetUtil returns a 404 Not Found error. A) Make sure the address is correct B) Make sure that the Apache httpd.conf file has been updated C) Make sure that the Tomcat server.xml file has been updated D) Make sure that Apache and Tomcat were restarted after the configuration files were updated