Spatial Forecast Verification Methods
- Getting Started
SpatialVx is an R package for performing spatial forecast verification. Most of the recently proposed methods are or will be included.
Note: This package is still under development and has not been thoroughly tested. Furthermore, major changes may still take place to the existing code.
See the Content tab above to find out what major methods are currently available. More coming soon.
Installation is easy if you're familiar with R. In either case, see the Installation tab above to learn how to install the package.
The Updates tab shows the progression of the package, and gives more details about what has been included. This will be re-set upon release of version 1.0-0 (to be the first official release).
When a solid foundation of all the methods to be included is finished, then a user's manual, quick-start guide and tutorial will be created, and made available under the Getting Started tab.
Please consider registering* if you use, or plan to use, the package.
Once you have registered, you can sign up to receive updates on the package. Note that it is not necessary to register to get help, but it saves time! Information collected is not shared with third parties, and can be deleted upon request. Questions about the package can be addressed to Eric Gilleland.
The following table lists the major methods currently available in SpatialVx. See the help file for the primary function for more details on a particular method.
|Field Signficance||spatbiasFS, LocSig, MCdof||Simulation-based field significance testing|
|Spatial Prediction Comparison Test||spct (regular or irregular grids) and for regular grids only: lossdiff (find the loss differential field for a specific loss function), empiricalVG.lossdiff (estimate the variogram for the loss differential field), flossdiff (fit a parametric variogram model to the empirical variogram for the loss differential field), and summary to finally do the test.||Test for competing forecasts that accounts for spatial dependence|
|Fractions Skill Score (FSS)||hoods2d||Compares event frequencies within neighborhoods|
|Fuzzy Logic||hoods2d||Compares event frequencies within neighborhoods with alternative definitions for hits, misses, false alarms and correct negatives|
|Joint Distribution||hoods2d||Compares event frequencies within neighborhoods with alternative definitions for hits, misses, false alarms and correct negatives|
|Minimum Coverage||hoods2d||Compares event frequencies within neighborhoods|
|Multiple Event Contingency Table||hoods2d||Compares event frequencies within neighborhoods of the forecast against binary events in the verification field|
|Practically Perfect Hindcast||pphindcast2d||Compares event frequencies within neighborhoods of the forecast against binary events in the verification field|
|Pragmatic||hoods2d||Compares event frequencies within neighborhoods of the forecast against binary events in the verification field|
|Upscaling||upscale2d||Computes traditional forecast verification on smoothed fields|
|Wavelet De-Noising||wavePurifyVx||De-noises the verification set by applying a discrete wavelet transform to each field, setting small wavelet coefficients to zero and then re-constructing the fields|
|Intenisty Scale (IS)||waveIS||Applies summary measures to detail fields of the binary verification set|
|Structure Function||structurogram, structurogram.matrix||Calculates the structure function for different spatial separations|
|Variogram||variogram.matrix||Variability at different spatial separations (special case of the structure function)|
|Wavelet Details||waverify2d||Applies summary measures to detail fields of the verification set|
|Baddeley's binary image metric||locmeasures2d||Modification of the Hausdorff metric that takes the Lp-norm of the difference in distance maps for the verification set|
|Forecast Quality Index||FQI, UIQI||Image metric informing about both loation and intensity errors|
|Geometric characterizations||Aindex, Cindex, Sindex||Summary indices of shapes and textures of each field|
|Hausdorff metric||locperf, locmeasures2d||Maximum difference in distance maps for the verification set|
|Mean Error Distance||locperf, locmeasures2d||The mean of shortest distances between events in the forecast to events in the verification set|
|Mean Square Error Distance||locperf, locmeasures2d||The mean of squared shortest distances between events in the forecast to events in the verification set|
|Minimum Separation Between Boundaries||locperf, locmeasures2d||Smallest value of the distance map from one field over the subset where events occur in the other field|
|modified Hausdorff Measure||locperf, locmeasures2d||Modification of the Hausdorff metric that compares the maximum of the mean error distance|
|partial Hausdorff Measure||locperf, locmeasures2d||Modification of the Hausdorff metric that compares the k-th largest difference in distance maps for the verification set|
|Pratt's Figure of Merit||locperf, locmeasures2d||Compares binary images|
|Zhu's metric||metrV||linear combination of two measures: (i) the amount of overlap between events in two fields and (ii) the mean error distance|
|Image Moments||imomenter||Yields useful information about an object or field, such as the centroid, orientation angle, etc.|
|Optical Flow (OF)||OF||Deform the forecast field to better match the observed field using gradient information|
|Rigid Transformation||rigider, rigidTransform||Calculate the optimal rigid transformation (translations and rotations)|
|Cluster Analysis||clusterer, CSIsamples||Cluster analysis applied to the locations and possibly also the intensities|
|Contiguous Rain Area||craer||Performs rigid transformations on matched features, then breaks down the MSE based on different sources of errors.|
|Method of Object-based Diagnostic Evaluation (MODE)||Identification of features: convthresh, threshfac and threshsizer. As of version 0.3 (to be released), these three functions are deprecated, having been combined into the single function, FeatureFinder.
Merging and/or matching features: centmatch, deltamm, minboundmatch.
Analyzing features: FeatureMatchAnalyzer, FeatureAxis, FeatureProps, FeatureComps, FeatureTable and interester.
|MODE has numerous possibilities whereby one attempts to (i) identify, (ii) possibly merge and (iii) match features between fields, and then apply summaries (e.g., the above field displacement methods).|
|Structure, Amplitude, Location (SAL)||saller (see also functions for identification of features for MODE above)||Compares the distributions of identified features in terms of structure, amplitude and location|
|Composite||compositer (see also functions for identification of features for MODE above)||Places features onto a relative grid so that they all share the same centroid.|
|Shape Analysis||hiw, distill and summary (see also functions for identification of features for MODE above)||Identifies landmarks (boundary points) for each feature, and sets up an object that can be analyzed by functions from the shapes package. Also finds the translation SS and intensity SS values for every pair of features between fields.|
|Anomoly Correlation Coefficient (ACC)||ACC||Correlation using a climatology instead of the mean|
|Bias-adjusted Gilbert Skill Score||vxstats||Adjusted traditional GSS to account for spatial bias|
|Gaussian Mixture Model (GMM)||gmm2d||Fits a Gaussian Mixture Model to each field and makes comparisons using the estimated parameters (this is a cross between a field summary, because it is applied to the entire field, and a feature-based approach because it informs about specific features whereby the number of features is determined a priori)|
|Geographic Box-Plot||GeoBoxPlot||Box-plots that allow for incorporation of differing areas of grid squares when aggregating over space.|
|S1 Score||S1||A comparison of gradients between two fields|
SpatialVx is a new R package with functions to perform many of the spatial forecast verification methods compared in the ICP (as well as some others).
First, you must have R installed on your computer. To install R, go to the R project home page (http://www.r-project.org).
To install SpatialVx, simply open an R session, and from the R prompt, type:
update.packages(checkBuilt = TRUE)
The first line is to ensure that you have the most recent build of the packages on which SpatialVx depends (if you have any of them previously installed, which is not entirely unlikely given the numerous dependencies). If you do not have the correct permissions, you may have to have your systems administrator install the package for you. Alternatively, you may be able to provide an alternate directory to which you can install R packages. For example, say you have permissions to install packages to the directory "[path]/library". Then replace the second line above with
install.packages("SpatialVx", lib = "[path]/library")
Once the package is successfully installed, you must load it into your R session in order to use it (this must be done each time you begin a new R session). This can be done with
Or, if you installed it to a directory where R does not know to search, then, supposing it is installed in [path]/library, use:
library(SpatialVx, lib.loc = "[path]/library")
To see how to cite SpatialVx, type from the R prompt:
Updates for SpatialVx
- Version 0.1-0
- Version 0.1-1
- Version 0.1-2
- Version 0.1-3
- Version 0.1-4
- Version 0.1-5
- Version 0.1-8
- Version 0.1-9
- Version 0.2-0
- Version 0.2-1
- Version 0.2-2
- Version 0.2-3
- Version 0.2-4
- Version 0.3
- Version 0.4
- Version 0.5
Version 0.5 (to be submitted)
Several bugs were found in deltamm that either caused errors, or just gave bad results. These bugs have been fixed, but note that I still consider this function to be under testing.
Version 0.4 (submitted to CRAN on 22 February 2016)
A bug in locmeasures2d (or, really, locperf) was discovered by Barbara Casati whereby the modified Hausdorff distance returned some kind of gibberish. An additional bug was also discovered whereby if one did not call one of a few of the other measures, the function would error out. Attempting to fix this bug proved to be more trouble than it is worth. Therefore, this measure has been removed from consideration in this package. It can, however, easily be computed by applying locmeasures2d to compute mean error distance in both directions (i.e., A = locmeasures2d( object = obsField, Y = fcstField, which.stats = "med" ) and B = locmeasures2d( object = fcstField, Y = obsField, which.stats = "med")) and then taking the maximum of the two values returned (or the minimum, as is defined in some more recent literature).
Per a request, a function to do the spatial prediction comparison test (SPCT) for irregular spatial locations has been added. Namely, spct, which works with expvgram; not to be confused with expvg, which works with the old spatMLD (now lossdiff) to do SPCT on regular grids.
spatMLD has been renamed to lossdiff, fit.spatMLD to flossdiff and empiricalVG.lossdiff has been added as an additional step. The trend estimation used by spatMLD did not test for significance of the trend. It is an important step that perhaps should not have been automated. The procedure now has a new step conducted by the new function empiricalVG.lossdiff that fits the empirical variogram to the loss differential field created by lossdiff. It is now up to the user to determine if a trend exists, and if so, estimate that trend before calling empiricalVG.lossdiff. The trend can then either be removed before calling empiricalVG.lossdiff or within the call (the trend removed is returned in the output object). The function lossdiff still estimates a trend using lm, and the output object is printed in the call to print so that one can quickly see the results.
The three feature identification functions (convthresh, threshfac and threshsizer) have been combined into a single function called FeatureFinder, which also allows for more flexibility including: (i) combining the different methods (e.g., convolution smoothing with a further requirement that all features be at least min.size grid squares large), (ii) allows one to identify only smaller features (using the max.size argument), (iii) allow for a different smoothing parameter for each of the two fields (if smoothing is performed), (iv) allow for a different min/max size across fields, and (v) allow for different threshold factors for the two fields.
The fit.spatMLD function has been modified to use a more reliable fitting routine so that, with luck, it will not fail to find a good fitting variogram as often. It is possible to write your own variogram function to use with fit.spatMLD.
A bug fix from a dependent package caused an error on a developmental version of R. This version fixes that problem.
This version includes a bug fix where centmatch would sometimes fail erroneously if only one feature existed in one or both fields.
Additionally, new functions now exist that allow one to carry out the contiguous rain area (CRA) method of Ebert and McBride (2000). In particular, the new functions rigider (finds an optimal rigid transformation), rigidTransform (performs a rigid transformation for given translation and rotation parameters), Fint2d (2-d interpolation), and craer (perform the CRA method for any object of class "matched". Also, a new feature matching function based on minimum boundary separation has been added (minboundmatch).
This version fixes a user-found bug in the print function for deltamm objects whereby it could not figure out that matches existed even when they did. The function imomenter has also been added for calculating image moments.
This update addresses some new changes in the spatstat package related to some now depracated functions. Thanks to Adrian Baddeley for informing me about them.
A bug in the vxstats function has been fixed whereby it used to give erroneous results for the contingency-table based smoothing filter methods.
Additionally, functionality for compositing features (compositer) and for identifying boundary points along user-specified angles (hiw) has been added. In the former case, similar methods as proposed by Nachamkin (2004) and Micheas et al. (2007) can be performed. Note, however, that the exact methodology differs in each case (see the associated help files). The function hiw allows for an object to be set up that can then be analyzed using the shapes package in R (its summary function gives some of the SS values, namely: location/translation, and average, minimum and maximum intensity SS).
This version fixes the bugs introduced with the previous version (0.1-8), but also includes some new functionality. Specifically, a new function called MergeForce takes objects returned by functions like centmatch and deltamm and forces the mergings into a new "matched" object. Further, new MODE functions have been addd that more-or-less round out the MODE method (one could do infinitely many other things with the MODE approach). In particular, FeatureTable conducts the feature-based contingency table approach and interester carries out the fuzzy logic approach of MODE.
No new methods added to this version, but numerous bugs have been fixed, and the feature-based code has been greatly simplified. In particular, FeatureSuite, FeatureFunPrep and their method functions have been removed in favor of the users' performing each step in the feature-based paradigm themselves. This change should not only make the coding side of things easier, but it should also prove easier and more transparent for the user. One bug since this release that has been found (and will be fixed for the next version) is that saller, threshfac and threshsizer did not get the necessary updates, and so are temporarily unusable (this package is still a work in progress).
Notable for this release is that the centroid distance is no longer calculated using the spatstat function centroid.owin and can be given in either lon/lat or indexed coordinates. Centroid distance can now be computed using any user-provided distance function with the correct formatting. The default (which generally should be preferred) is to use rdist from package fields, but if great-circle distance is appropriate, then rdist.earth (also from fields) can be used (in which case the centroid will be automatically calculated from the user-provided lon/lat coordinates given in the call to make.SpatialVx). Additionally, both centmatch and deltamm have been fixed and simplified (they were not working correctly in all cases). Further, deltamm was slow because it was having to re-calculate the distance maps for every call to deltametric. Now, deltametric is not called, the distance maps are first calculated and re-used as appropriate, and most (not all) metrics that were calculated multiple times are now only calculated once. The result is a much faster function.
Only minor bug fixes (more to come) and some new method functions.
This version contains some improvements to the features-based verification functions; e.g., some new method functions for some of the main functions. There is also now a function for calculating the bearing from one feature to another (see the function help file for bearing), which was adapted from code written by Randy G. Bullock.
A function for doing the cluster analysis procedure described in Marzban and Sandgathe (2006), called clusterer has been added. Additionally, Caren Marzban provided some functions written by Hillary Lyons for doing the variation of the cluster analysis approach proposed in Marzban and Sandgathe (2008), which have been modified and are included as CSIsamples. The package fastcluster is now required, which enables use of a fast cluster procedure through hclust (Daniel Müllner, fastcluster: Fast hierarchical clustering routines for R and Python, Version 1.1.6, 2012. Available at CRAN http://cran.r-project.org and http://math.stanford.edu/~muellner).
In addition to the cluster analysis, Caren Marzban also provided code for performing the optical flow verification procedure introduced in Marzban and Sandgathe (2010), which too has been modified and made available primarily through the new funciton OF.
Functionality to estimate the empirical structure function described in Harris et al (2001) has been added via the functions structurogram and structurogram.matrix.
For each of the objects returned by the cluster analysis, optical flow and structure functions, there are method functions (e.g., plot, summary, hist) available. See the help files for each function to see which specific method functions are available.
In addition to the new required package fastcluster, two other packages are now required: CircStats (S-plus original by Ulric Lund and R port by Claudio Agostinelli
(2009). CircStats: Circular Statistics, from "Topics in circular Statistics" (2001). R package version 0.2-4. http://CRAN.R-project
.org/package=CircStats), which allows for computing statistics for cicular data (e.g., feature axis angles, bearings, etc.) as well as plotting features such as rose.diag. The maps package is also now required (Original S code by Richard A. Becker and Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P. Minka
New for this version of SpatialVx is the 2-d Gaussian Mixture Model (GMM) method introduced in Lakshmanan and Kain (2010). In particular, the function gmm2d fits a GMM to both fields of a verification set and its summary function calculates the various errors. For speed, the turboem function from package turboEM is employed (Bobb, J. F., H. Zhao and R. Varadhan, 2012. turboEM: A Suite of Convergence Acceleration Schemes for EM and MM algorithms. R package version 2012.2-1. http://CRAN.R-project.org/package=turboEM).
Also new in this version are the functions S1, which calculates the S1 score, and ACC, which calculates the anomoly correlation between two fields (provided climatologies are supplied).
This version adds a few new methods to the mix. In particular, the spatial prediction comparison test (SPCT) proposed by Hering and Genton (2011) has been added via the functions spatMLD, fit.spatMLD and their summary and plot method functions (summary actually does the test once
In association with the above method, a variogram function is included that is a modification of vgram.matrix from the fields package called variogram.matrix, which works exaclty the same as vgram.matrix, but allows for missing values. The function vgram.matrix is a fast function for calculating the empirical variogram for matrices (i.e., appropriate when the data are gridded so that grid information can be used to calculate distances). In particular, the plot method function for vgram.matrix also works for variogram.matrix. Therefore, the SPCT can be applied to precipitation fields when grid points with zeros in the verification field, and two forecast fields are all zero. This is accomplished (automatically by spatMLD if zero.out=TRUE) by setting zero values to NA so that they are subsequently not involved in any calculations. Thus, this can be used to make variogram plots such as those proposed in Marzban and Sandgathe (2009).
Some geometric characterization measures introduced in AghaKouchak et al. (2011) are included here. These include the connectivity index (Cindex), shape index (Sindex), and area index (Aindex).
Finally, the bias-corrected versions of the threat score and Gilbert Skill Score introduced in Mesinger (2008) are included in the vxstats function.
While still a work-in-progress, this version expands considerably on the intial release. Future releases will be aimed at including all of the methods applied in the ICP, as well as some others. This version contains most of the location measures/metrics including (see Baddeley, 1992) : (i) Baddeley's binary image metric, (ii) Hausdorff metric, (iii) partial Hausdorff measure, (iv) mean error distance, (v) mean square error distance, (vi) Pratt's Figure of Merit (FOM), (vii) minimum separation between boundaries, (viii) modified Hausdorff distance, (ix) metrV (Zhu et al, 2011), and (x) the Forecast Quality Index (FQI) of Venugopal et al. (2005).
In addition to the above location metrics/measures, the wavelet denoising technique of Briggs and Levine (1997) (a neighborhood type method) is included, along with the scale separation wavelet methods of Briggs and Levine (1997), Casati et al. (2004), and Casati (2010). Functions from the waveslim package are used, and in particular it is allowed to have non-dyadic fields using the maximal overlap discrete wavelet transform (see help file for SpatialVx, and the associated functions for more information and references).
Some features-based functionality has also been added, though these functions are still in their infancy, and major changes could happen in future releases. However, there is functionality to do object identification in the manner of convolution thresholding (Davis et al., 2006) and just thresholding (both of which use the connected function from the spatstat package, which uses a very computationally efficient algorithm for labelling connected objects in a field), some merging and matching (e.g., by centroid distance ala Davis et al., 2006, and using Baddeley's delta metric ala Gilleland et al., 2008), and some analysis techniques including the SAL method (Wernli et al., 2008), as well as functions for simply computing various properties of either single objects or object pairs (e.g., axis angle, angle difference, aspect ratio, centroid, centroid distance, the various location metrics and measures mentioned above, area, intersection area, area ratio).
In addition to the methods outlined in Gilleland et al. (2009, 2010), there is also the field significance method of Elmore et al. (2006). The functions used here rely on functions from package boot.
This new version of SpatialVx relies heavily on existing R packages, and therefore has new dependencies. In particular, the location metrics/measures that rely on quickly computing distances between every point in the field, and some binary event utilize functions such as deltametric and distmap from package spatstat (A. Baddeley and R. Turner, 2005. Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12 (6), 1-42. ISSN: 1548-7660. URL: www.jstatsoft.org). Many functions from this same package are also used for the features-based methods, as this package already contains many functions for doing image analysis work. Additionally, the package smatr is used for finding the major axis of features/objects (David Warton, Remko Duursma, Daniel Falster and Sara Taskinen, 2011. smatr: (Standardised) Major Axis Estimation and Testing Routines. R package version 3.2.4. http://CRAN.R-project.org/package=smatr). Package waveslim is used for all of the wavelet analysis functions (Brandon Whitcher, 2010. waveslim: Basic wavelet routines for one-, two- and three-dimensional signal processing. R package version 1.6.4. http://CRAN.R-project.org/package=waveslim). Finally, the boot package is used for the field significance approach (Angelo Canty and Brian Ripley, 2012. boot: Bootstrap R (S-Plus) Functions. R package version 1.3-4.). Thanks to Brandon Whitcher for agreeing to re-build/submit waveslim in the new R version so that it can be used here.
At present, the maps package (Original S code by Richard A. Becker and Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka
Version 0.1-0 (The initial release)
The initial release (version 0.1-0) included functions for doing most of the neighborhood forecast verification methods including (see Ebert, 2008): (i) Fractions Skill Score (FSS), (ii) upscaling, (iii) minimum coverage, (iv) fuzzy logic, (v) multi-event contingency table, (vi) pragmatic, and (vii) practically perfect hindcast. At this point, no functionality for handling more than one forecast time is provided, but such ability is planned for a future release.
Package dependcies for this initial release are limited to fields (Reinhard Furrer, Douglas Nychka and Stephen Sain (2012). fields: Tools for spatial data. R package version 6.6.3. http://CRAN.R-project.org/package=fields), and it is expected that many more of these functions will utilize tools from this package. Thanks to Doug Nychka for re-building/submitting this package under the newer R versions so that it could be installed with this package!
You can register here, and then you will be given the opportunity to sign up to receive email notifications about version updates, etc.
*Any information collected is used solely to determine the legitimacy of subscription requests (e.g., to protect against spam). Email addresses are added to a cont rolled list (only subscribers may send messages). The list is intended primarily for participants in the ICP, but others wishing to keep updated on the progress of the project may als o subscribe. If you have any trouble subscribing (or unsubscribing) contact the webmaster (Eric Gilleland a>).
The National Center for Atmospheric Research is sponsored by the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.