K-Fold Cross Validation

K-fold cross validation options for SurrogateTrainer are in development. In k-fold cross validation, the sampler data is divided into kk non-overlapping folds. The model is retrained kk times, in each instance with a different fold held back for testing. In this way, each predictor/response pair is used in training k1k-1 models, and used for evaluation once. The performance of the model in predicting responses to the held back values is reported as the root-mean-square error (RMSE).

RMSE=i=1n(y^iyi)2n,\text{RMSE} = \sqrt{\frac{\sum_{i=1}^{n}(\hat{y}_i - y_i)^2}{n}},

Currently, this tool has only been demonstrated for PolynomialRegressionTrainer, but demonstrations for other trainers (such as PolynomialChaosTrainer) will be made available in the near future. This example is meant to demonstrate how the cross validation capabilities in the SurrogateTrainer class are used. This example builds on Polynomial Regression Surrogate, using the same physical problem and uncertain parameters. See that example for further information on setting up the training and model evaluation for this problem.

Model Problem

This example uses a one-dimensional heat conduction problem as the full-order model which has certain uncertain parameters. The model equation is as follows:

kd2Tdx2=q,x[0,L]-k\frac{d^2T}{dx^2} = q \,, \quad x\in[0, L]dTdxx=0=0T(x=L)=T\begin{aligned} \left.\frac{dT}{dx}\right|_{x=0} &= 0 \\ T(x=L) &= T_{\infty} \end{aligned}

The quantities of interest are the average and maximum temperature:

Tˉ=0LT(x)dxLTmax=maxx[0,L]T(x)\begin{aligned} \bar{T} &= \frac{\int_{0}^{L}T(x)dx}{L} \\ T_{\max} &= \max_{x\in[0,L]}T(x) \end{aligned}

Parameter Uncertainty

For demonstration, each of these parameters will have two types of probability distributions: Uniform (U(a,b)\mathcal{U}(a,b)) and Normal (N(μ,σ)\mathcal{N}(\mu,\sigma)). Where aa and bb are the max and minimum bounds of the uniform distribution, respectively. And μ\mu and σ\sigma are the mean and standard deviation of the normal distribution, respectively.

The uncertain parameters for this model problem are:

ParameterSymbolUniformNormal
ConductivitykkU(1,10)\sim\mathcal{U}(1, 10)N(5,2)\sim\mathcal{N}(5, 2)
Volumetric Heat SourceqqU(9000,11000)\sim\mathcal{U}(9000, 11000)N(10000,500)\sim\mathcal{N}(10000, 500)
Domain SizeLLU(0.01,0.05)\sim\mathcal{U}(0.01, 0.05)N(0.03,0.01)\sim\mathcal{N}(0.03, 0.01)
Right Boundary TemperatureTT_{\infty}U(290,310)\sim\mathcal{U}(290, 310)N(300,10)\sim\mathcal{N}(300, 10)

Analytical Solutions

This simple model problem has analytical descriptions for the field temperature, average temperature, and maximum temperature:

T(x,k,q,L,T)=q2k(L2x2)+TTˉ(k,q,L,T)=qL23k+TTmax(k,q,L,T)=qL22k+T\begin{aligned} T(x,k,q,L,T_{\infty}) &= \frac{q}{2k}\left(L^2 - x^2\right) + T_{\infty} \\ \bar{T}(k,q,L,T_{\infty}) &= \frac{qL^2}{3k} + T_{\infty} \\ T_{\max}(k,q,L,T_{\infty}) &= \frac{qL^2}{2k} + T_{\infty} \end{aligned}

With the quadratic feature of the field temperature, using quadratic elements in the discretization will actually yield the exact solution.

Input File

Below is the input file used to solve the one-dimensional heat conduction model.

[Mesh]
  type = GeneratedMesh
  dim = 1
  nx = 100
  xmax = 1
  elem_type = EDGE3
[]

[Variables]
  [T]
    order = SECOND
    family = LAGRANGE
  []
[]

[Kernels]
  [diffusion]
    type = MatDiffusion
    variable = T
    diffusivity = k
  []
  [source]
    type = BodyForce
    variable = T
    value = 1.0
  []
[]

[Materials]
  [conductivity]
    type = GenericConstantMaterial
    prop_names = k
    prop_values = 2.0
  []
[]

[BCs]
  [right]
    type = DirichletBC
    variable = T
    boundary = right
    value = 300
  []
[]

[Executioner]
  type = Steady
  solve_type = PJFNK
  petsc_options_iname = '-pc_type -pc_hypre_type'
  petsc_options_value = 'hypre boomeramg'
[]

[Postprocessors]
  [avg]
    type = AverageNodalVariableValue
    variable = T
  []
  [max]
    type = NodalExtremeValue
    variable = T
    value_type = max
  []
[]

[Outputs]
[]
(contrib/moose/modules/stochastic_tools/examples/surrogates/sub.i)

With this input the uncertain parameters are defined as:

  1. kk\rightarrow Materials/conductivity/prop_values

  2. qq\rightarrow Kernels/source/value

  3. LL\rightarrow Mesh/xmax

  4. TT_{\infty}\rightarrow BCs/right/value

These values in the sub.i file are arbitrary since the stochastic master app will be modifying them.

Cross Validation

To perform cross validation, a SurrogateModel must be included in the input file along with the trainer. This is used to calculate predictions for the holdout set. The following options can be used to control the cross validation routine:

  • "cv_n_trials": The number of repeated cross validation trials to perform. This option can be used to better estimate the performance of the model

  • "cv_splits": The number of splits (k) to use in cross validation.

  • "cv_surrogate": The SurrogateModel object to use for evaluating error compared to the test data for each split.

  • "cv_type": The type of cross-validation to perform. Currently, the only options are none or k_fold.

The following input file snippet provides an example of performing repeated 5-fold cross validation for 100 trials using a PolynomialRegressionTrainer and PolynomialRegressionSurrogate, for the example one-dimensional heat conduction model used in Training a Surrogate Model. Please refer to the documentation for this model type for details on other options. It is also important to note that the values in GlobalParams could have been set in Trainers/pr_max instead.

[GlobalParams]
  sampler = cv_sampler
  response = results/response_data:max:value
  cv_type = "k_fold"
  cv_splits = 5
  cv_n_trials = 100
[]

[Distributions]
  [k_dist]
    type = Uniform
    lower_bound = 1
    upper_bound = 10
  []
  [q_dist]
    type = Uniform
    lower_bound = 9000
    upper_bound = 11000
  []
  [L_dist]
    type = Uniform
    lower_bound = 0.01
    upper_bound = 0.05
  []
  [Tinf_dist]
    type = Uniform
    lower_bound = 290
    upper_bound = 310
  []
[]

[Samplers]
  [cv_sampler]
    type = LatinHypercube
    distributions = 'k_dist q_dist L_dist Tinf_dist'
    num_rows = 6560
    execute_on = PRE_MULTIAPP_SETUP
  []
[]

[Trainers]
  [pr_max]
    type = PolynomialRegressionTrainer
    regression_type = "ols"
    max_degree = 3
    cv_surrogate = "pr_surr"
    execute_on = timestep_end
  []
[]

[Surrogates]
  [pr_surr]
    type = PolynomialRegressionSurrogate
    trainer = pr_max
  []
[]
(contrib/moose/modules/stochastic_tools/examples/surrogates/cross_validation/all_trainers_uniform_cv.i)

Results and Analysis

In this section, cross validation results for uniform and normal parameter distributions are provided. Here, we've only trained models for TmaxT_{max} for simplicity. A short analysis of the results is provided as well to showcase potential issues the user might encounter when using polynomial regression.

For reference, results for TmaxT_{max} from Polynomial Regression Surrogate are summarized in Table 1.

Table 1: The reference results for the mean and standard deviation of the maximum temperature.

MomentUniformNormal
μTmax\mu_{T_{max}}301.3219301.2547
σTmax\sigma_{T_{max}}5.958510.0011

Uniform parameter distributions

First, we examine results from cross validation of a third degree polynomial regression for uniformly distributed parameters. For comparison, 2-fold, 5-fold, and 10-fold cross validation was used. Because kk-fold CV does not test every possible splitting of the training data, the resulting RMSE can vary significantly depending on the splits used. To better estimate the model's performance, repeated cross validation can be performed. For each kk, cross validation was repeated nn=1e5 times to obtain a more representative RMSE across the set of trials - for this example, the mean and standard deviation were calculated. The results of these trials are summarized in Table 2. For this learning problem, the cross validation results seem to support the use of a third degree polynomial regression - in all cases, the mean RMSE was less than 0.1% of the mean TmaxT_{max}

Table 2: Mean and standard deviation of RMSE scores obtained for 1e5 repeated cross validation trials, for uniform parameter distributions.

Moment2-fold5-fold10-fold
μRMSE\mu_{RMSE}0.24960.24840.2483
σRMSE\sigma_{RMSE}0.00176.5e-44.1e-4

Distributions of the RMSE obtained from repeated cross validation are shown in figure 1. For larger kk, the size of each fold decreases and the model has access to more of the training data. Because of this, we expect less variance in the resulting RMSE scores for greater kk. This is reflected in the plot. For 2-fold cross validation, the distribution is wide, indicating that any given trial of 2-fold CV may provide a poor measure of model performance compared to 5- or 10-fold CV. As a trade off, increasing kk increases training expense - more models must be trained on larger subsets. These factors should be kept in mind when cross validating a surrogate model.

Figure 1: Distribution of RMSE reported from 1e5 repetitions of kk-fold cross validation for the example problem in Training a Surrogate Model.

Normal parameter distributions

Next, we examine results from cross validation of a third degree polynomial regression for normally distributed parameters. Again, 2-fold, 5-fold, and 10-fold cross validation was repeated for 1e5 trials. The mean and standard deviation of the RMSE for each kk is reported in Table 3.

Table 3: Mean and standard deviation of RMSE scores obtained for 1e5 repeated cross validation trials, for uniform parameter distributions.

Moment2-fold5-fold10-fold
μRMSE\mu_{RMSE}8.1878.0778.057
σRMSE\sigma_{RMSE}0.14250.04860.0297

The cross validation results for this learning problem are significantly more pessimistic than the previous - the RMSE scores across the board are roughly 3% of TmaxT_{max}. The reason is straightforward - with Latin Hypercube sampling, unlikely parameter values (in the tail of the normal distribution) are sparsely represented in the sampling data. When these samples are left out for cross validation, the model is trained primarily on parameters near the mean. As a result, the polynomial model will tend to have disproportionately high error for these unlikely parameters/response pairs. This was not observed when using uniformly distributed parameters, as the full parameter range was (roughly) equally represented in all folds.

If high surrogate accuracy is needed for parameters in the tails of the probability distributions, this may indicate improvements are needed in the modeling or sampling procedure. This is a good example of a case where cross validation can be invaluable in properly assessing deficiencies in a model.

Figure 2: Distribution of RMSE reported from 10000 repetitions of 5-fold cross validation for the example problem in Training a Surrogate Model.

Other surrogate model types

Cross-validation can be used to characterize differences in predictive accuracy between different types of surrogate models. For demonstration, the analysis of the preceding sections were repeated for several other Surrogate types. However, because the cost of repeated cross-validation for large datasets is more significant for some models, only 100 repetitions of cross-validation were performed with 1000 Latin Hypercube samples. This was sufficient to reveal useful differences between model types.

The following model types were used:

The following listing summarizes the Surrogate types considered in this comparison, along with the required control parameters.

[GlobalParams]
  sampler = cv_sampler
  response = results/response_data:max:value
  cv_type = "k_fold"
  cv_splits = 5
  cv_n_trials = 100
[]

[Trainers]
  [pr_max]
    type = PolynomialRegressionTrainer
    regression_type = "ols"
    max_degree = 3
    cv_surrogate = "pr_surr"
    execute_on = timestep_end
  []
  [pc_max]
    type = PolynomialChaosTrainer
    order = 3
    distributions = "k_dist q_dist L_dist Tinf_dist"
    cv_surrogate = "pc_surr"
    execute_on = timestep_end
  []
  [np_max]
    type = NearestPointTrainer
    cv_surrogate = "np_surr"
    execute_on = timestep_end
  []
  [gp_max]
    type = GaussianProcessTrainer
    covariance_function = 'rbf'
    standardize_params = 'true'
    standardize_data = 'true'
    cv_surrogate = "gp_surr"
    execute_on = timestep_end
  []
  [ann_max]
    type = LibtorchANNTrainer
    num_epochs = 100
    num_batches = 5
    num_neurons_per_layer = '64'
    learning_rate = 1e-2
    rel_loss_tol = 1e-4
    filename = mynet.pt
    read_from_file = false
    print_epoch_loss = 0
    activation_function = 'relu'
    cv_surrogate = "ann_surr"
    standardize_input = false
    standardize_output = false
  []
[]

[Covariance]
  [rbf]
    type = SquaredExponentialCovariance
    noise_variance = 3.79e-6
    signal_variance = 1 #Use a signal variance of 1 in the kernel
    length_factor = '5.34471 1.41191 5.90721 2.83723' #Select a length factor for each parameter
  []
[]
(contrib/moose/modules/stochastic_tools/examples/surrogates/cross_validation/all_trainers_uniform_cv.i)

RMSE scores for each model type were accumulated over 100 repeated trials of 5-fold cross validation, for only uniform parameter distributions. In the following table, the results are summarized by a simple mean and standard deviation of the RMSE across all trials.

Table 4: Mean and standard deviation of RMSE scores for all model types, obtained for 100 repeated cross validation trials with uniform parameter distributions.

MomentPolynomial RegressionPolynomial ChaosNearest PointGaussian ProcessLibtorch ANN
μRMSE\mu_{RMSE}0.27830.7273.7240.2070.189
σRMSE\sigma_{RMSE}0.0041.1410.0660.0070.019

Table 4 summarizes the model comparison results with uniform parameter distributions. An immediately striking observation is that the RMSE for the Polynomial Chaos model was two orders of magnitude greater (roughly 10% of the mean TmaxT_{max}) than that observed with several of the other models. This is expected, as Polynomial Chaos is known to be poor for single-predictor evaluations, and is primarily used to provide an effective means to characterize statistical moments of a response (see PolynomialChaos). Otherwise, NearestPoint has significantly greater validation error compared to the other model types. This is also expected, because NearestPoint (a piecewise constant model) is generally a poor approximation.

For the more sophisticated model types, the mean RMSE across the trial set was comparable and low, indicating that any of these models would be similarly effective as a surrogate for this problem. However, it is important to note that the Libtorch neural network showed greater variability in validation error than either the Polynomial Regression or Gaussian Process models for this problem - (σμ)ANN0.1\left( \frac{\sigma}{\mu} \right)_{ANN} \approx 0.1, compared to (σμ)PR0.01\left( \frac{\sigma}{\mu} \right)_{PR} \approx 0.01 and (σμ)GP0.03\left( \frac{\sigma}{\mu} \right)_{GP} \approx 0.03. This indicates that the neural network was more sensitive to variations in the training set than these other models. This could be caused by several factors, such as overfitting, and may indicate a need to better tune the parameters used to define the model.