Network Monitoring


The Network Monitoring Task Force (NMTF), was chartered to investigate tools and techniques for monitoring traffic, reliability, and consistency of connections between ESnet sites and sites of interest on the greater Internet.

The initial work has focussed on the end-to-end performance monitoring tools initially developed at the Stanford Linear Accelerator Center. The mechanism used (ICMP ping) has been validated by demonstrating that measurements made with it correlate with Web HTTP application response. A data collection architecture was defined which allows several collection sites, each of which monitors multiple remote sites of interest to it. The data from the collection sites is gathered via the Web and saved at an archive site. The archived data is analyzed at the archive site and reports made available via the Web. In addition a new data recording format that accomodates multiple collection sites was defined and documented, and the archive site now provides acces to the data via a simple to use Web form.

Currently there are 14 collection sites in 8 countries serving the ESnet and HENP community. About 480 links are monitored in 22 countries providing round trip response time, packet loss, unpredictability, unreachability and network busy measures. Remote sites were chosen by polling the research community and/or sampling network accounting data. HEPNRC at FNAL are operating the archive site. Tools have been developed and are in production to allow reviewing of the data cached at the collection sites, and to allow selection of links and time frames for which to generate plots of response time and packet loss for the data saved at the archive site. The tools are collectively known as PingER and have been paackaged and documented so they can be used by other groups, either as individual items or as a collection.

The results indicate that by most measures, performance within ESnet is excellent to good. One of the main reasons for the poor performance between ESnet and other sites is congestion points, so peering is critical. Also, ESnet traffic accepted from major HENP labs is growing by 2.5-6% per month, however, response time is improving by 1-2% per month and packet loss between SLAC and other sites is improving by 3% per month. Packet loss performance between ESnet and the Internet at large is, on average, poor or worse for the hosts monitored. Packet loss seen from SLAC for non-ESnet hosts improved dramatically between April and June 1996, and the improvement has been sustained. In general performance is extremely variable in both the short and long-term, particularly for international hosts. From SLAC, average monthly response times by host groups are typically 300-500 ms. for international hosts, 150-220 ms. for eastern N. American hosts, 80-140 ms. for western N. American hosts, and 40-50 ms. for ESnet hosts.

The methodology is also being utilized to: select Internet Service Providers (ISPs) and monitor their performance possibly with a view to writing a service contract; help decide which universities to connect directly to ESnet; and to identify bottlenecks in order to decide where to focus efforts/work-arounds.

Considerable interest and requests for information has been obtained from ESnet related groups such as the International Committee on Future Accelerators (ICFA), as well as outside the ESnet community. In particular the Cross Industry Working Team (XIWT) Internet Performance Working Team (IPWT) have chosen the PingER tools to be the basis for their Internet monitoring. Pinger is now also installed at about a dozen XIWT sites including HP, Intel, SBC, Digital, NIST, CNRI, and GTE. Presentations have been made at various federal network meetings (EOWG, CCIRN), and 3 papers submitted for publication in the last 6 months.