K-Fold Cross Validation
K-fold cross validation options for SurrogateTrainer
are in development. In k-fold cross validation, the sampler data is divided into non-overlapping folds. The model is retrained times, in each instance with a different fold held back for testing. In this way, each predictor/response pair is used in training models, and used for evaluation once. The performance of the model in predicting responses to the held back values is reported as the root-mean-square error (RMSE).
Currently, this tool has only been demonstrated for PolynomialRegressionTrainer, but demonstrations for other trainers (such as PolynomialChaosTrainer) will be made available in the near future. This example is meant to demonstrate how the cross validation capabilities in the SurrogateTrainer
class are used. This example builds on Polynomial Regression Surrogate, using the same physical problem and uncertain parameters. See that example for further information on setting up the training and model evaluation for this problem.
Model Problem
This example uses a one-dimensional heat conduction problem as the full-order model which has certain uncertain parameters. The model equation is as follows:
The quantities of interest are the average and maximum temperature:
Parameter Uncertainty
For demonstration, each of these parameters will have two types of probability distributions: Uniform () and Normal (). Where and are the max and minimum bounds of the uniform distribution, respectively. And and are the mean and standard deviation of the normal distribution, respectively.
The uncertain parameters for this model problem are:
Parameter | Symbol | Uniform | Normal |
---|---|---|---|
Conductivity | |||
Volumetric Heat Source | |||
Domain Size | |||
Right Boundary Temperature |
Analytical Solutions
This simple model problem has analytical descriptions for the field temperature, average temperature, and maximum temperature:
With the quadratic feature of the field temperature, using quadratic elements in the discretization will actually yield the exact solution.
Input File
Below is the input file used to solve the one-dimensional heat conduction model.
(contrib/moose/modules/stochastic_tools/examples/surrogates/sub.i)With this input the uncertain parameters are defined as:
Materials/conductivity/prop_values
Kernels/source/value
Mesh/xmax
BCs/right/value
These values in the sub.i
file are arbitrary since the stochastic master app will be modifying them.
Cross Validation
To perform cross validation, a SurrogateModel must be included in the input file along with the trainer. This is used to calculate predictions for the holdout set. The following options can be used to control the cross validation routine:
"cv_n_trials": The number of repeated cross validation trials to perform. This option can be used to better estimate the performance of the model
"cv_splits": The number of splits (k) to use in cross validation.
"cv_surrogate": The
SurrogateModel
object to use for evaluating error compared to the test data for each split."cv_type": The type of cross-validation to perform. Currently, the only options are
none
ork_fold
.
The following input file snippet provides an example of performing repeated 5-fold cross validation for 100 trials using a PolynomialRegressionTrainer and PolynomialRegressionSurrogate, for the example one-dimensional heat conduction model used in Training a Surrogate Model. Please refer to the documentation for this model type for details on other options. It is also important to note that the values in GlobalParams
could have been set in Trainers/pr_max
instead.
Results and Analysis
In this section, cross validation results for uniform and normal parameter distributions are provided. Here, we've only trained models for for simplicity. A short analysis of the results is provided as well to showcase potential issues the user might encounter when using polynomial regression.
For reference, results for from Polynomial Regression Surrogate are summarized in Table 1.
Table 1: The reference results for the mean and standard deviation of the maximum temperature.
Moment | Uniform | Normal |
---|---|---|
301.3219 | 301.2547 | |
5.9585 | 10.0011 |
Uniform parameter distributions
First, we examine results from cross validation of a third degree polynomial regression for uniformly distributed parameters. For comparison, 2-fold, 5-fold, and 10-fold cross validation was used. Because -fold CV does not test every possible splitting of the training data, the resulting RMSE can vary significantly depending on the splits used. To better estimate the model's performance, repeated cross validation can be performed. For each , cross validation was repeated =1e5 times to obtain a more representative RMSE across the set of trials - for this example, the mean and standard deviation were calculated. The results of these trials are summarized in Table 2. For this learning problem, the cross validation results seem to support the use of a third degree polynomial regression - in all cases, the mean RMSE was less than 0.1% of the mean
Table 2: Mean and standard deviation of RMSE scores obtained for 1e5 repeated cross validation trials, for uniform parameter distributions.
Moment | 2-fold | 5-fold | 10-fold |
---|---|---|---|
0.2496 | 0.2484 | 0.2483 | |
0.0017 | 6.5e-4 | 4.1e-4 |
Distributions of the RMSE obtained from repeated cross validation are shown in figure 1. For larger , the size of each fold decreases and the model has access to more of the training data. Because of this, we expect less variance in the resulting RMSE scores for greater . This is reflected in the plot. For 2-fold cross validation, the distribution is wide, indicating that any given trial of 2-fold CV may provide a poor measure of model performance compared to 5- or 10-fold CV. As a trade off, increasing increases training expense - more models must be trained on larger subsets. These factors should be kept in mind when cross validating a surrogate model.
Figure 1: Distribution of RMSE reported from 1e5 repetitions of -fold cross validation for the example problem in Training a Surrogate Model.
Normal parameter distributions
Next, we examine results from cross validation of a third degree polynomial regression for normally distributed parameters. Again, 2-fold, 5-fold, and 10-fold cross validation was repeated for 1e5 trials. The mean and standard deviation of the RMSE for each is reported in Table 3.
Table 3: Mean and standard deviation of RMSE scores obtained for 1e5 repeated cross validation trials, for uniform parameter distributions.
Moment | 2-fold | 5-fold | 10-fold |
---|---|---|---|
8.187 | 8.077 | 8.057 | |
0.1425 | 0.0486 | 0.0297 |
The cross validation results for this learning problem are significantly more pessimistic than the previous - the RMSE scores across the board are roughly 3% of . The reason is straightforward - with Latin Hypercube sampling, unlikely parameter values (in the tail of the normal distribution) are sparsely represented in the sampling data. When these samples are left out for cross validation, the model is trained primarily on parameters near the mean. As a result, the polynomial model will tend to have disproportionately high error for these unlikely parameters/response pairs. This was not observed when using uniformly distributed parameters, as the full parameter range was (roughly) equally represented in all folds.
If high surrogate accuracy is needed for parameters in the tails of the probability distributions, this may indicate improvements are needed in the modeling or sampling procedure. This is a good example of a case where cross validation can be invaluable in properly assessing deficiencies in a model.
Figure 2: Distribution of RMSE reported from 10000 repetitions of 5-fold cross validation for the example problem in Training a Surrogate Model.
Other surrogate model types
Cross-validation can be used to characterize differences in predictive accuracy between different types of surrogate models. For demonstration, the analysis of the preceding sections were repeated for several other Surrogate types. However, because the cost of repeated cross-validation for large datasets is more significant for some models, only 100 repetitions of cross-validation were performed with 1000 Latin Hypercube samples. This was sufficient to reveal useful differences between model types.
The following model types were used:
Third degree PolynomialRegression.
Third degree PolynomialChaos.
GaussianProcess, with a SquaredExponentialCovariance function. Length scales for each input parameter were chosen by first performing hyperparameter tuning with a reduced training data set.
LibtorchANN, with a single hidden layer and 64 neurons.
The following listing summarizes the Surrogate types considered in this comparison, along with the required control parameters.
(contrib/moose/modules/stochastic_tools/examples/surrogates/cross_validation/all_trainers_uniform_cv.i)RMSE scores for each model type were accumulated over 100 repeated trials of 5-fold cross validation, for only uniform parameter distributions. In the following table, the results are summarized by a simple mean and standard deviation of the RMSE across all trials.
Table 4: Mean and standard deviation of RMSE scores for all model types, obtained for 100 repeated cross validation trials with uniform parameter distributions.
Moment | Polynomial Regression | Polynomial Chaos | Nearest Point | Gaussian Process | Libtorch ANN |
---|---|---|---|---|---|
0.278 | 30.727 | 3.724 | 0.207 | 0.189 | |
0.004 | 1.141 | 0.066 | 0.007 | 0.019 |
Table 4 summarizes the model comparison results with uniform parameter distributions. An immediately striking observation is that the RMSE for the Polynomial Chaos model was two orders of magnitude greater (roughly 10% of the mean ) than that observed with several of the other models. This is expected, as Polynomial Chaos is known to be poor for single-predictor evaluations, and is primarily used to provide an effective means to characterize statistical moments of a response (see PolynomialChaos). Otherwise, NearestPoint has significantly greater validation error compared to the other model types. This is also expected, because NearestPoint (a piecewise constant model) is generally a poor approximation.
For the more sophisticated model types, the mean RMSE across the trial set was comparable and low, indicating that any of these models would be similarly effective as a surrogate for this problem. However, it is important to note that the Libtorch neural network showed greater variability in validation error than either the Polynomial Regression or Gaussian Process models for this problem - , compared to and . This indicates that the neural network was more sensitive to variations in the training set than these other models. This could be caused by several factors, such as overfitting, and may indicate a need to better tune the parameters used to define the model.