Polynomial Regression Surrogate
This example is meant to demonstrate how a polynomial regression based surrogate model is trained and used on a parametric problem. Additionally, the results are compared to those obtained using a Polynomial Chaos (PC) surrogate. The possible differences in applicability are highlighted as well. For more on the regression method used here, see PolynomialRegressionTrainer while details of Polynomial Chaos are available under PolynomialChaos.
Problem Statement
The full-order model in this example is essentially the same as the one described in Training a Surrogate Model. It is a one-dimensional heat conduction model:
where is the temperature, is the thermal conductivity, is the length of the domain, is a heat source and is the value of the Dirichlet boundary condition. To make the comparison between different surrogate models easier, only the maximum temperature is selected to be the Quantity of Interest (QoI):
(1)The problem is parametric in a sense that the solution depends on four input parameters: . Two problem settings are considered in this example. In the first scenario, all of the parameters are assumed to have Uniform distributions (), while the second considers parameters with Normal distributions (). To be more specific the distributions for the two cases are:
Parameter | Symbol | Uniform | Normal |
---|---|---|---|
Conductivity | |||
Volumetric Heat Source | |||
Domain Size | |||
Right Boundary Temperature |
The parameters of the uniform distribution are the minimum and maximum bounds, while the parameters of the normal distribution are the mean and standard deviation. It must be mentioned that the maximum temperature can be determined analytically and turns out to be:
Using this expression and the previously described probability density functions, the mean () and standard deviation () of the QoI can be computed for reference:
Table 1: The reference results for the mean and standard deviation of the maximum temperature.
Moment | Uniform | Normal |
---|---|---|
301.3219 | 301.2547 | |
5.9585 | 10.0011 |
Solving the problem without uncertain parameters
The first step towards creating a surrogate model is the generation of a full-order model which can solve Eq. (1) with fixed parameter combinations. The complete input file for this case is presented in Listing 1.
Listing 1: Complete input file for the heat equation problem in this study.
(contrib/moose/modules/stochastic_tools/examples/surrogates/polynomial_regression/sub.i)Training surrogate models
Both surrogate models are constructed using some knowledge about the full-order problem. This means that the full-order problem is solved multiple times with different parameter samples and the value of the QoI is stored from each computation. This step is managed by a master input file which creates parameter samples, transfers them to the sub-application and collects the results from the completed computations. For more information about setting up master input files see Training a Surrogate Model and Parameter Study. The two complete training input files used for the two cases with the two different parameter distributions are available under uniform and normal.
The training phase starts with the definition of the distributions in the Distributions
block. The uniform distributions can be defined as:
For the case with normal distributions the block changes to:
(contrib/moose/modules/stochastic_tools/examples/surrogates/polynomial_regression/normal_train.i)As a next step, several parameter instances are prepared by sampling the underlying distributions. The sampling objects can be defined in the Samplers
block. The generation of these parameter samples is different for the two surrogate models. Meanwhile the polynomial chaos uses the samples at specific quadrature points in the parameters space (generated by a QuadratureSampler), the polynomial regression model is trained using samples from a LatinHypercube. It is visible that the number of sample (num_rows
) is set in the LatinHypercube to match the number of samples in the tensor-product quadrature set of QuadratureSampler.
The objects in blocks Controls
, MultiApps
, Transfers
and Reporters
are responsible for managing the communication between master and sub-applications, execution of the sub-applications and the collection of the results. For a more detailed description of these blocks see Parameter Study and Training a Surrogate Model.
The next step is to set up two Trainer
objects to generate the surrogate models from the available data. This can be done in the Trainers
block. It is visible that both examples use the data from Sampler
and Reporter
objects. A polynomial chaos surrogate of order 8 and a polynomial regression surrogate with a polynomial of degree at most 4 is used in this study. The PolynomialChaosTrainer also needs knowledge about the underlying parameter distributions to be able to select matching polynomials.
As a last step in the training process, the important parameters of the trained surrogates are saved into .rd
files. These files can be used to construct the surrogate models again without the need to carry out the training process from the beginning.
Evaluation of surrogate models
To evaluate surrogate models, a new master input file has to be created for uniform and normal parameter distributions. The input files contain testing distributions for the parameters defined in the Distributions
block. In this study, the training distributions are used for the testing of the surrogates as well. Both surrogate models are tested using the same parameter samples. These samples are selected using LatinHypercube defined in the Samplers
block. Since the surrogate models are orders of magnitude faster than the full-order model, samples are selected for testing (compared to used for training).
As a next step, two object are created in the Surrogates
block for the two surrogate modeling techniques. Both of them are constructed using the information available within the corresponding .rd
files.
These surrogate models can be evaluated at the points defined in the testing sample batch. This is done using objects in the Reporters
block.
Results and Analysis
In this section the results from the different surrogate models are provided. They are compared to the reference results summarized in Table 1. A short analysis of the results is provided as well to showcase potential issues the user might encounter when using polynomial regression.
Uniform parameter distributions
First, the case with parameters having uniform distributions are investigated. The statistical moments obtained by the execution of the surrogate model are summarized in Table 2.
Table 2: Comparison of the statistical moments from different surrogate models assuming uniform parameter distributions.
Moment | Reference | Poly. Chaos | Poly. Reg. (deg. 4) | Poly. Reg. (deg. 8) |
---|---|---|---|---|
301.3219 | 301.3218 | 301.3234 | 301.3220 | |
5.9585 | 5.9586 | 5.9625 | 5.9655 |
It can be observed that the polynomial chaos surrogate gives results closer to the reference values. It is also visible that by increasing the polynomial order for the regression, the accuracy in the standard deviation slightly decreases. The histogram of the results is presented in Figure 1. It is important to mention that the results for the polynomial regression surrogate were obtained using max_degree=4
. It is apparent that the two methods give similar solutions.
Figure 1: Histogram of the maximum temperature coming from the Monte Carlo run using the surrogate models and assuming uniform parameter distributions.
Normal parameter distributions
Next, the case with normally distributed parameters is analyzed. The statistical moments of the results from testing the surrogate model are summarized in Table 3.
Table 3: Comparison of the statistical moments from different surrogate models assuming normal distributions.
Moment | Reference | Poly. Chaos | Poly. Reg. (deg. 4) | Poly. Reg. (deg. 8) |
---|---|---|---|---|
301.2547 | 301.3162 | 301.5663 | 301.5810 | |
10.0011 | 10.1125 | 11.2912 | 30.1675 |
It is visible that polynomial chaos surrogate gives the closest results to the reference values. Furthermore, the increase in the polynomial degree for the regression leads to a decrease in accuracy for both the mean and the standard deviation. This behavior is often referred to as overfitting which decreases the accuracy with the increasing model parameters. The histogram of the results is presented in Figure 2. It is important to mention that the results for the polynomial regression surrogate were obtained using max_degree=4
. It is apparent that the two methods give similar solutions, however the tails of the histogram of the polynomial regression are longer.
Figure 2: Histogram of the maximum temperature coming from the Monte Carlo run using the surrogate models and assuming normal parameter distributions.