Database
Management
The Army sponsored 4DWX program
is the first attempt in RAP at using a database to manage most of
the primary data in a major operational system. At the onset
of the program, an ambitious course was laid out which called for
applying database technologies to all of the data sets, including
the large 3D output volumes generated from the MM5
community mesoscale modeling system. After a period of extensive
testing, it was determined that performance issues and implementation
complexity compelled a scaled-down plan for the database, restricting
it to the collection of measured observation data sets needed in
the 4DWX system. These include all surface, upper-air, and
corresponding pseudo-observations (POBS) falling within MM5's domain
2 nest. POBS are created by interpolating MM5 output
to each sampled location for all of the measured observations collected
in the system, and provide a convenient dataset for computing MM5
validation statistics and model-based climatology.
Relational Database
The 4dwx system uses a relational database management system (RDBMS)
to store data online. The decision to use a relational database
was based on several factors:
- A location was needed to store the system data.
- A flexible mechanism was needed to handle retrieval requests
from researchers for climatology, forecasting, and model validation.
- There was a need to be able to store and retrieve data using
a variety of programming languages, such as C/C++, Perl and Java,
without using a proprietary or legacy interface.
- One of the system requirements is to be able to efficiently
handle a large amount of data.
A relational database meets all of these needs. Relational databases
store data, using tables to hold the data and relations to link data
from one table to another. Most relational databases use SQL, a standardized
language for querying databases. There are also standard interfaces
for database programming, such as ODBC and JDBC. Databases can handle
large amount of data, and by using indexes, can retrieve individual
records at high speeds.
There are some drawbacks to using relational databases. There is
extra administrative overhead involved, including installing and
maintaining the database software, designing, creating and maintaining
the database objects, and tuning the database for performance.
About MySQL
Initially, the program opted for the Oracle RDBMS, for both its
popularity in the commercial database world, and for its perfomance,
reliability and scalability. At the time, it was the only
database that could properly handle the need for representing upper
air data in the binary large object (BLOB) form. A switch
was recently made to use the MySQL database, both for cost and performance
reasons. Recent versions of MySQL now meet all of the current
program needs and incurs a lower administration overhead.
MySQL is a very fast, efficient and reliable database server, and
is relatively simple to maintain compared to most other database
systems. As the name implies, MySQL uses SQL for its query
language. Drivers are available for both ODBC and JDBC, and perl
modules are available as well. Since JDBC is used to access
the database, all of the 4DWX database ingest and retrieval
codes from Oracle were ported to MySQL with very little effort.
MySQL is available for free for non-commercial use, and commercial
license costs are very reasonable.
More information on MySQL can be found at http://www.mysql.org
4dwx Data
The types of data stored in the 4DWX database include local observations,
NWS observations, and model ouput interpolated to the observation
locations (POBS). The specific types of observation data stored
in the system are:
- SAMS
- METAR
- RAWS
- Soundings
- Profilers
- Lightning
These data are loaded into the database by ingesters running continuously
on the 4DWX server host. The database is constantly growing,
as the data are not scrubbed. A rough approximation of the amount
of data stored in the database is on the order of 25 GB per year,
for each of the five Army test ranges scheduled for implementation
by the end of FY 00.
Retrieval Interface
One of the drawbacks to developing applications using a relational
database is that detailed knowledge of the database schema is required
by an applications programmer. Changes to the design of the
database will have an effect on every application that uses the
database.
In order to circumvent this problem, a higher-level interface has
been developed for data retrieval and all applications which tie
into the database. Thus, if the underlying structure of the
database changes, then only the components of the retrieval interface
need to be updated. Documentation for the retrieval interface can
be found at http://atec-server/classdoc.
Servlets Data servlets are also under development as part
of the 4DWX data management infrastructure. Servlets are java
classes which are executed by a web server when certain requests
are received. These data servlets were written using the 4DWX
Retrieval Interface and allow applications to request data using
the http protocol. A principal advantage of this design is
that is enables access to the Army test range databases which reside
behind firewalls.
Ingesters 4DWX Ingesters were written in Perl or Java.
These programs load the data into the database tables, using SQL
and either Perl's DBI interface or Java's JDBC interface.
Eventually all of the ingesters will be written in Java, and they
will be controlled by a multi-threaded ingest process controller.