Information Technology

Database Management
Communications and Networking
Advanced Computing
Advanced Computing

The majority of the ongoing technical development work at RAP is being done on PC's running the Linux operating system.  When projects have needed  high-performance compute platforms to run simulations such as MM5, RAP has traditionaly selected platforms from manufacturers such as Cray, SGI, and DEC.  RAP's Army-sponsored 4DWX program, which requires multiple instances of the MM5 program running at four of the ATEC ranges throughout the country, has recently built a Linux cluster to address the latest MM5 requirements.

The ATEC linux cluster project is an attempt to move a highly-computational task from an SGI O2000 to a cluster of PCs running Linux. The idea was inspired by the Beowulf project and some other general linux clustering projects.  The lower cost per floating point operation afforded by a cluster solution was the driving motivation for this work.  RAP designed and built from scratch an 8-node system with 4U rack-mountable hardware.  Each node consisted of dual 500 MHz Pentium II processors and 500 MB of RAM, and the nodes were interconnected with a Gigabit Ethernet switch.  The hardware itself costed $32 K, and the assembly required 4 days of labor.  In contrast, an 8-node SGI O2000 costed somewhere in the vicinity of five times this amount at the time of build (October 1999).

Current Status

  • MM5 (versions 2.0 and 3.0) are both running successfully on the cluster, with a 10% speed increase over the SGI using Gigabit (1000-base-T) networking between the nodes. The current benchmarks are estimated, but fairly accurate.

  •  
  • The "Scali" networking system from Dolphin Interconnect Solutions will be tested in the mid-March 2000 timeframe.  Dolphin has indicated that their product is capable of a 10x speedup in network throughput as compared to the current solution and is much more scalable over a large number of nodes.

  •  
  • The MM5 pre- and post-processing software is being ported from the SGI system.  It is estimated that this work will be completed by the end of March, 2000. 
Notes about Parallel Computing:
  • In general, parallel computing is a method of problem solving in such a way that many parts of the same problem are divided up and distributed to a number of "engines" or "nodes". This allows the problem to change from a O(1) problem to a O(1/n) problem.  If faster execution speed is required, additional nodes are simply added to the cluster.

  •  
  • In order to succeed with this concept, it was necessary to move away from the "multiprocessor" version of MM5 which was running on the SGIs and to investigate a new "distributed" version of MM5, which was under development within NCAR's MMM division. MMM's port of MM5 to MPI made it possible to run this model on the cluster.

  • For the most part, MM5 parallelizes well.  The problem space can be suitably divided into smaller partitions. The work done in each partition is fairly independent, and can be executed with minimum overhead.
     
  • One of the motivators for choosing a PC solution for solving this parallel processing problem was that Intel continues to demonstrate a rapid improvement of compute power to price efficiency.
Official MM5 benchmarks:
  • This is a work in progress.  Initial results should be available here by the end of March 2000.

  • Benchmarks for the MPI version of MM5 which were run on other architectures can be found here.

(versus)







Updated 5/28/2000