Quality Assessment/Verification

advanced

Background


Example of regional performance of the Current Icing Potential (CIP), showing ROC area. ROC area is a measure of skill, which ranges between 0 and 1, with larger values associated with higher skill. (click on image to enlarge).

The FAA Aviation Weather Research Program (AWRP) is engaged in research that is directed toward improving weather forecasts for the aviation community.  Much of this research, in the form of automated algorithms to predict aviation weather phenomena such as icing and turbulence, is transferred from research laboratories to the National Weather Service (NWS) through the Aviation Weather Technology Transfer (AWTT) process. RAL has a significant role in the effort to develop appropriate methods to evaluate these products and to ensure their scientific capabilities as they transition to operations. This work includes identification of appropriate observational datasets; development of methodologies that are appropriate for the specific sets of forecasts and observations, and which can provide meaningful results in the context of the operational application of the forecasts; implementation of the methodologies in controlled evaluations and intercomparisons; and analysis of the results of evaluation studies to assess the performance of the new forecasting techniques relative to the current operational forecasts.

Much of the forecast evaluation work undertaken for the AWRP (especially for AWTT evaluations) is provided by the Quality Assessment Product Development Team (QA PDT). This team is a partnership between the Verification Group in RAL and members of the Forecast Verification Section of the Global Systems Division of the NOAA Earth System Research Laboratory (ESRL/GSD). B. Brown in RAL is the co-lead of the QA PDT, and several RAL staff members (M. Chapman, T. Fowler, L. Holland, and A. Takacs) are members of the PDT.

Current Activities


Example of new object-based forecast evaluation methodology, applied to convective nowcasts produced by the NCAR Autonowcaster. (click on image to enlarge).

The main purpose of the QA PDT is to evaluate objectively the forecasting performance of new automated aviation weather forecasting and diagnostic algorithms to ensure that the algorithms provide improved forecasting capabilities. The QA PDT approach for evaluating the algorithms’ performance is through intensive independent assessment exercises.  Each algorithm transitioning through the AWTT process is evaluated for a three-month period, with its performance compared to an operational standard.  The results are summarized in a written report and provided to the AWTT Technical Review Panel (TRP) as input to the transition process. 

The QA PDT establishes objective procedures for evaluating forecast quality and accuracy, applying advanced weather observations and measurements from remote-sensing instruments.  For example, a variety of satellite and ground-based measurements were identified and utilized by L. Holland and A. Takacs in a recent assessment of the performance of a cloud-top height diagnostic that was developed by the Oceanic Weather PDT. In some cases, special data collection efforts are undertaken: in one recent example J. Braid, B. Brown, and T. Fowler organized an ongoing special pilot report (PIREP) collection project as part of the NASA TAMDAR Great Lakes Fleet Experiment.

The specific measurements and observations selected for an evaluation form the basis for the verification methodologies and estimates of forecast accuracy.  Due to the nature of aviation weather, the available observational datasets often do not directly represent the forecast attributes being evaluated; thus, inferences and comparisons based on a variety of observational datasets are used to establish forecast quality. The statistical verification methodologies are designed to represent the operational use of the aviation forecasts. Because characteristics of aviation forecasts and observations often are different from standard forecasts and observations, the QA PDT typically must develop new advanced verification methodologies that provide more appropriate or better measures of forecast accuracy than standard approaches.  For example, the QA PDT has developed specific criteria regarding the appropriate use of information from PIREPs for evaluation of icing and turbulence forecasts. These criteria are necessary due to the non-systematic and subjective nature of the reports. To represent the uncertainty associated with various verification statistics, T. Fowler developed methods based on re-sampling techniques to estimate confidence intervals for the statistics. These methods are commonly applied in the QA studies, to provide meaningful comparisons among forecasting systems. New object-based methods are being developed by R. Bullock, B. Brown, and J. Halley Gotway to provide more meaningful measurements of the performance of convective weather forecasts. This approach provides one pathway toward development of verification metrics that provide information that is operationally meaningful. The QA PDT (including R. Bullock, J. Wolff, and J. HalleyGotway) is also in the process of developing the ability to evaluate forecast performance as a function of air traffic flow reductions and operational risk.

Results/Recent Accomplishments


Histogram showing error in ceiling height (METAR-NCV-A).  Figure taken from Quality Assessment Report: National Ceiling and Visibility Analysis Product, 2005:  Quality Assessment Team. (click on image to enlarge).

Over the past year, T. Fowler, A. Holmes, and others in the QA PDT completed evaluations for the National Ceiling and Visibility Analysis (NCV-A) algorithm, in support of its transition to experimental status. This evaluation was performed using a cross-validation technique, in which subsets of observation stations were held out from the algorithm and used for verification. Another study, which included contributions from A. Takacs, L. Holland, R. Hueftle, and E. Gilleland, focused on evaluation of the Oceanic Weather Cloud Top Height algorithm; this study used a variety of observational datasets, including radar, rawinsonde, and satellite, to infer forecast quality over oceanic domains. The results of these studies were summarized in written reports and provided to the AWTT TRP. The QA PDT also continued to prepare for evaluations of the National Convective Weather Forecast (NCWF-2); the NCV Forecast product; the Current and Forecast Icing Potential severity algorithms; and the Graphical Turbulence Guidance.