Information Technology

Database Management
Communications and Networking
Advanced Computing
Database Management

The Army sponsored 4DWX program is the first attempt in RAP at using a database to manage most of the primary data in a major operational system.  At the onset of the program, an ambitious course was laid out which called for applying database technologies to all of the data sets, including the large 3D output volumes generated from the MM5 community mesoscale modeling system.  After a period of extensive testing, it was determined that performance issues and implementation complexity compelled a scaled-down plan for the database, restricting it to the collection of measured observation data sets needed in the 4DWX system.  These include all surface, upper-air, and corresponding pseudo-observations (POBS) falling within MM5's domain 2 nest.   POBS are created by interpolating MM5 output to each sampled location for all of the measured observations collected in the system, and provide a convenient dataset for computing MM5 validation statistics and model-based climatology.

Relational Database

The 4dwx system uses a relational database management system (RDBMS) to store data online. The decision to use a relational database was based on several factors:

  • A location was needed to store the system data.
  • A flexible mechanism was needed to handle retrieval requests from researchers for climatology, forecasting, and model validation.
  • There was a need to be able to store and retrieve data using a variety of programming languages, such as C/C++, Perl and Java, without using a proprietary or legacy interface.
  • One of the system requirements is to be able to efficiently handle a large amount of data.
A relational database meets all of these needs. Relational databases store data, using tables to hold the data and relations to link data from one table to another. Most relational databases use SQL, a standardized language for querying databases.  There are also standard interfaces for database programming, such as ODBC and JDBC. Databases can handle large amount of data, and by using indexes, can retrieve individual records at high speeds.

There are some drawbacks to using relational databases. There is extra administrative overhead involved, including installing and maintaining the database software, designing, creating and maintaining the database objects, and tuning the database for performance.

About MySQL

Initially, the program opted for the Oracle RDBMS, for both its popularity in the commercial database world, and for its perfomance, reliability and scalability.  At the time, it was the only database that could properly handle the need for representing upper air data in the binary large object (BLOB) form.  A switch was recently made to use the MySQL database, both for cost and performance reasons.  Recent versions of MySQL now meet all of the current program needs and incurs a lower administration overhead.

MySQL is a very fast, efficient and reliable database server, and is relatively simple to maintain compared to most other database systems.  As the name implies, MySQL uses SQL for its query language. Drivers are available for both ODBC and JDBC, and perl modules are available as well.  Since JDBC is used to access the database,  all of the 4DWX database ingest and retrieval codes from Oracle were ported to MySQL with very little effort.

MySQL is available for free for non-commercial use, and commercial license costs are very reasonable.

More information on MySQL can be found at http://www.mysql.org

4dwx Data

The types of data stored in the 4DWX database include local observations, NWS observations, and model ouput interpolated to the observation locations (POBS).  The specific types of observation data stored in the system are:

  • SAMS
  • METAR
  • RAWS
  • Soundings
  • Profilers
  • Lightning
These data are loaded into the database by ingesters running continuously on the 4DWX server host.  The database is constantly growing, as the data are not scrubbed.  A rough approximation of the amount of data stored in the database is on the order of 25 GB per year, for each of the five Army test ranges scheduled for implementation by the end of FY 00.

Retrieval Interface

One of the drawbacks to developing applications using a relational database is that detailed knowledge of the database schema is required by an applications programmer.  Changes to the design of the database will have an effect on every application that uses the database.

In order to circumvent this problem, a higher-level interface has been developed for data retrieval and all applications which tie into the database.  Thus, if the underlying structure of the database changes, then only the components of the retrieval interface need to be updated. Documentation for the retrieval interface can be found at http://atec-server/classdoc.

Servlets Data servlets are also under development as part of the 4DWX data management infrastructure.  Servlets are java classes which are executed by a web server when certain requests are received.  These data servlets were written using the 4DWX Retrieval Interface and allow applications to request data using the http protocol.  A principal advantage of this design is that is enables access to the Army test range databases which reside behind firewalls.

Ingesters 4DWX Ingesters were written in Perl or Java. These programs load the data into the database tables, using SQL and either Perl's DBI interface or Java's JDBC interface.

Eventually all of the ingesters will be written in Java, and they will be controlled by a multi-threaded ingest process controller.

Updated 5/28/2000