Core Forecasts   >   Forecasts Modules   >   Forecast Integration

 

 

1. Core Forecasts

The DICastTM system generates point forecasts for user-defined locations (e.g., cities, at observation sites, locations along the highway system, agricultural fields, etc.). At observational sites, forecast parameter tuning based on past performance helps improve the forecasts. This class of sites is called core forecast sites. Forecasts at non-core sites are derived from forecasts at core sites.

The numerical weather model data used by DICastTM has three-hourly resolution. Since the system is primarily model data driven, forecasts are initially generated at three-hour intervals. These times are called the core forecast lead times.

   [Top]  

2. DICastTM Forecast Modules

DICastTM creates several independent forecast estimates. Each forecast module attempts to create the best forecast it can by applying a specific forecast technique to its input data set. Each DICastTM forecast module uses one of three basic techniques to generate forecasts. They are:

· Dynamic Model Output Statistics (DMOS),
· Interpolation of NWS MOS site forecasts, and
· Semi-static techniques.

Each forecast module produces an identically formatted output file. No forecast module is dependent on another forecast module. That is, no forecast module's output is used as input to another forecast module.

2.1 Dynamic MOS forecast modules

The Dynamic MOS (DMOS) forecast modules are a dynamic variation of the traditional NWS MOS procedures. DMOS, like traditional MOS, finds relationships between model output data and observations using linear regression methods. However, while MOS equations are calculated using many years of data, DMOS uses only the last 3 months (configurable) of data. New regression equations are re-calculated once per week.

The DMOS technique has several advantages over traditional MOS. The reliance on only a short history allows DMOS equations to be calculated and DMOS forecasts generated for newly ingested models or models that are changing due to enhancements. Traditional MOS equation generation would require the model to be stable (no changes) for several years. Also, the MOS equations are calculated painstakingly with a large human quality control effort. This makes it difficult to add MOS equations for a new set of forecast sites. DMOS forecasts can be made at these sites immediately provided they have an observational history of at least three months (configurable).

A disadvantage of DMOS is that the equations it produces are less stable than MOS equations. For this reason, quality control checks must be put into place to assure that the equations produced will not create nonsensical outlier forecasts.

The DMOS subsystem applied to any model has three components:

  • Regressor calculation,
  • Empirical Relationships Generator, and
  • Forecast Generator.

The interaction of these three components is illustrated in Figure 1.

 

   [Top]  

2.2 Regressor Calculation

Regressors are variables extracted or derived from model data, which is likely to have a relationship to one of the output forecast variables. These regressors are calculated at each forecast site for each forecast lead-time. About 2/3 of the regressors are variables directly extracted from the model data. Other regressors are derived by combining several variables to estimate meteorological data not explicitly predicted by the models.

Regressors are variables extracted or derived from model data, which is likely to have a relationship to one of the output forecast variables. These regressors are calculated at each forecast site for each forecast lead-time. About 2/3 of the regressors are variables directly extracted from the model data. Other regressors are derived by combining several variables to estimate meteorological data not explicitly predicted by the models.

Since the forecast sites are rarely at model grid points, interpolation techniques are used to generate forecasts at the forecast sites. This requires an understanding of the projection of the model grid and the terrain assumptions used in each model. As some of the regressors are estimates of meteorological variables at the earth's surface, correcting for the simplified terrain used by the model is important and varies from model to model. The regressors from one model run are all stored in one file. The regressor files are put into a regressor history that the DMOS empirics process uses to calculate regression equations.

2.3 DMOS Empirical Relationships Generator

The DMOS Empirical Relationships Generator attempts to find relationships between the regressors and the observations at forecast sites. It does this using a linear regression technique. There are tradeoffs involved in determining the best regression equation. The goodness of fit measure of a regression equation is called its r - squared value. Typically, adding more regressors to an equation increases the r-squared value. However, this also increases the variance of the output forecasts since more regressors are included that do not have a strong relationship to the predictand. Therefore, the desired set of regressors has most of the information leading to a good prediction and does not contain noisy regressors.

Equations that do not have a sufficiently high r-squared value are replaced with a default equation. This default equation is a predefined combination of regressors defined by a meteorologist. A default equation is an attempt to generically replicate a meteorologist's logic in coming up with a forecast. Special, usually derived regressors have been developed for this specific purpose. These default equations generally do not produce the erroneous forecasts that a low r-squared equation might.

This best combination of regressors will vary from site to site, between forecast lead times, and clearly will be different for each forecast variable. The relationships will also vary from season to season and from model to model. The empirics generator is run once per week for each model to find the equations which best fit the most recent data. These equations are stored in a DMOS empirics file and used later by the DMOS forecast generator.

2.4 DMOS Forecast Generator

The DMOS Forecast Generator applies the empirical relationships generated by the DMOS Empirical Relationships Generator to the most recent regressors. This generates the DMOS forecast. The relationships between regressors that have done well at predicting the observations recently are used again on today's regressor data to make a DMOS forecast. If any of the regressors that appear in a regression equation are missing, a missing forecast is generated.

2.5 NWAS MOS Forecast Modules

These forecast modules are based on the MOS products generated by the National Weather Service. These forecasts are not a perfect match to the desired MDSS forecasts. The MOS data consist of point forecasts at sites chosen by the NWS. These MOS sites are generally a subset of the MDSS forecast sites. Also, the variables forecast in the MOS output varies for each of the NWS models. In addition, the variables do not directly match the MDSS forecast variables and it is possible that the forecast lead times do not match the MDSS forecast lead times.

At a site included in any particular NWS MOS forecast, the forecast module tries to reproduce the exact forecast. Where MDSS variables are explicitly forecast in the MOS product, they are simply copied. Otherwise, if reasonable, the MDSS forecast variable is derived from the MOS data. For some variables, no derivation is reasonable and these variables are left as missing data. If the forecast lead times of the MOS product do not match the MDSS forecast times, the forecast module makes an interpolated forecast where possible.

For the majority of the MDSS sites, no MOS forecasts exist. Forecasts for these sites are generated by interpolation techniques. The interpolated forecasts are generated using the forecasts generated at the MOS sites. No satisfactory interpolation technique has been found that works well for all variables in rough terrain. For example, the interpolation of surface winds in the mountains does not work well using any known technique. Semi static forecast modules.

Two forecast modules are called semi-static in that their forecasts depend only on historical data, not on any predictive forecast model. These two are the climatology and persistence forecast modules. These two look at the past weather over different time ranges and base their forecast on the average weather seen. The climatology forecast module uses data from up to the last 30 years. Monthly averages of the MDSS forecast variables have been computed and stored in a climatology file. These monthly climatological values are interpolated to the forecast date. The persistence forecast module averages the observations of the MDSS variables seen in recent days to come up with its forecast. The persistence and climatology forecast modules have more effect on the forecasts for longer-term forecasts periods (> 72 hours). These modules will not provide a significant contribution in the MDSS FP, which will be configured to only provide guidance out to 48 hours.

   [Top]  

3. Forecast Integration

3.1 Integration Overview

The DICastTM forecast modules each generate as complete a forecast as possible. This includes a forecast for every forecast variable at every forecast site for every forecast lead time. These independent forecast estimates are combined by the integrator to generate one final consensus forecast. Numerous combination techniques have been developed. Investigation has led to a decision to use an enhanced Widrow-Hoff learning method. This method creates its final forecast using a weighted average of the individual module forecasts. The weights are modified daily by nudging the weights in the gradient direction of the error in weight space. The effect of this is that forecast modules that have been performing well for a particular forecast (variable, site, and lead time) get more weight and the poorly performing modules get less weight. Note that different weight vectors exist for every forecast generation time due to differing latencies in the input data sets. The interaction of components of the integrator is illustrated in the figure below.

 

 

3.2 Integrator Empirics

This DICastTM process runs once per day and updates all the weights based on the performance of the various forecast modules. It reads the observations from the previous day and compares the forecast modules' output that predicted those observations. For each forecast, the errors are computed and the gradient vector in weight space is computed. A step proportional to the size of the combined error is taken in that gradient direction to compute the new weights.

   [Top]  


3.3 Integrator

The integrator creates a final forecast by making a bias-corrected confidence-weighted sum of the individual module forecasts. It reads the forecasts from the forecast module output files, the weights from the integrator empirics file, performs its calculations, and stores its results.

3.4 Non-verifiable Data Extractor

The DICastTM forecasting techniques described above only apply to core forecast variables. These are variables that are regularly measured and reported in meteorological observation data. The DMOS forecast modules and the integrator both require specific observations to tune themselves. The weights used in the combination are pre determined by a meteorologist familiar with the models and stored in a configuration file. The model variables to be combined have been extracted by the DMOS regressor calculation process and stored in a regressor file. The Non-Verifiable Data (NVD) extractor reads in the appropriate models' regressor files along with the weight configuration file before creating its weighted combination output.

3.5 Post-processor

The post-processor provides a variety of processing options to merge the integrator's forecasts and the NVD forecasts. It attempts also to remove ridiculous forecasts, derive other forecast variables, and spatially and temporally interpolate the forecasts to non-core forecast sites.

Quality control measures are applied to the integrator's output to ensure that no forecasts are well beyond reasonable ranges. Forecast values near the limits are returned to the bounding values. For example, forecasts of 101% probability of precipitation are turned into forecasts of 100%. Forecasts well beyond the bounds are replaced with a missing data flag.

Forecast variables required by users are derived from the core set of MDSS forecast variables. For example, relative humidity is derived from temperature and dew point temperature. The output of the integrator contains only forecasts for core forecast sites. Forecasts at the non-core sites are generated by spatial interpolation from the core sites' forecasts. Temporal interpolation of the three-hourly forecasts to one hour is used to generate the desired final forecast temporal resolution.

 

   [Top]     

Overview | Architecture | Performance | Applications | Contacts