Advanced
Computing
The majority of the ongoing technical development work at RAP is
being done on PC's running the Linux
operating system. When projects have needed high-performance
compute platforms to run simulations such as MM5,
RAP has traditionaly selected platforms from manufacturers such
as Cray, SGI, and DEC. RAP's Army-sponsored 4DWX
program, which requires multiple instances of the MM5 program running
at four of the ATEC ranges
throughout the country, has recently built a Linux cluster to address
the latest MM5 requirements.
The ATEC linux cluster project is an attempt to move a highly-computational
task from an SGI O2000 to a cluster of PCs running Linux. The idea
was inspired by the Beowulf
project and some other general linux clustering projects.
The lower cost per floating point operation afforded by a cluster
solution was the driving motivation for this work. RAP designed
and built from scratch an 8-node system with 4U rack-mountable hardware.
Each node consisted of dual 500 MHz Pentium II processors and 500
MB of RAM, and the nodes were interconnected with a Gigabit Ethernet
switch. The hardware itself costed $32 K, and the assembly
required 4 days of labor. In contrast, an 8-node SGI O2000
costed somewhere in the vicinity of five times this amount at the
time of build (October 1999).
Current Status
- MM5 (versions 2.0 and 3.0) are both running successfully on
the cluster, with a 10% speed increase over the SGI using Gigabit
(1000-base-T) networking between the nodes. The current benchmarks
are estimated, but fairly accurate.
- The "Scali" networking system from Dolphin Interconnect Solutions
will be tested in the mid-March 2000 timeframe. Dolphin
has indicated that their product is capable of a 10x speedup in
network throughput as compared to the current solution and is
much more scalable over a large number of nodes.
- The MM5 pre- and post-processing software is being ported from
the SGI system. It is estimated that this work will be completed
by the end of March, 2000.
Notes about Parallel Computing:
- In general, parallel computing is a method of problem solving
in such a way that many parts of the same problem are divided
up and distributed to a number of "engines" or "nodes". This allows
the problem to change from a O(1) problem to a O(1/n) problem.
If faster execution speed is required, additional nodes are simply
added to the cluster.
- In order to succeed with this concept, it was necessary to
move away from the "multiprocessor" version of MM5 which was running
on the SGIs and to investigate a new "distributed" version of
MM5, which was under development within NCAR's MMM
division. MMM's port of MM5 to MPI
made it possible to run this model on the cluster.
- For the most part, MM5 parallelizes well. The problem
space can be suitably divided into smaller partitions. The work
done in each partition is fairly independent, and can be executed
with minimum overhead.
- One of the motivators for choosing a PC solution for solving
this parallel processing problem was that Intel
continues to demonstrate a rapid improvement of compute power
to price efficiency.
Official MM5 benchmarks:
- This is a work in progress. Initial results should be
available here by the end of March 2000.
- Benchmarks for the MPI version of MM5 which were run on other
architectures can be found here.