GaussianProcessTrainer

"Gaussian Processes for Machine Learning" Rasmussen and Williams (2005) provides a well written discussion of Gaussian Processes, and its reading is highly encouraged. Chapters 1-5 cover the topics presented here with far greater detail, depth, and rigor. Furthermore, for a detailed overview on Gaussian Processes that model multiple outputs together (multi-output Gaussian Process or MOGP) we refer the reader to Liu et al. (2018).

The documentation here is meant to give some practical insight for users to begin creating surrogate models with Gaussian Processes.

Given a set of inputs X={x1,,xm}X=\lbrace{\vec{x}_1, \cdots, \vec{x}_m \rbrace} for which we have made observations of the correspond outputs Y={y1,,ym}Y=\lbrace{\vec{y}_1, \cdots, \vec{y}_m \rbrace} using the system (Y=f(X)Y = f(X)). Given another set of inputs X={x1,,xn}X_\star=\lbrace{\vec{x}_{\star 1}, \cdots, \vec{x}_{\star n} \rbrace} we wish to predict the associated outputs Y=f(X)Y_\star=f(X_\star) without evaluation of f(X)f (X_\star), which is presumed costly.

Parameter Covariance

In overly simplistic terms, Gaussian Process Modeling is driven by the idea that trials which are "close" in their input parameter space will be "close" in their output space. Closeness in the parameter space is driven by the covariance function k(x,x)k(\vec{x},\vec{x'}) (also called a kernel function, not to be confused with a MOOSE Framework kernel). This covariance function is used to generate a covariance matrix between the complete set of parameters XX={x1,,xm,x1,,xn}X \cup X_\star = \lbrace {\vec{x}_1, \cdots, \vec{x}_m, \vec{x}_{\star 1}, \cdots, \vec{x}_{\star n} \rbrace}, which can then be interpreted block-wise as various covariance matrices between XX and XX_\star.

K(XX,XX)=[k(x1,x1)k(x1,xm)k(x1,x1)k(x1,xn)(xm,x1)k(xm,xm)k(xm,x1)k(xm,xn)k(x1,x1)k(x1,xm)k(x1,x1)k(x1,xn)k(xn,x1)k(xn,xm)k(xn,x1)k(xn,xn)]=[K(X,X)K(X,X)K(X,X)K(X,X)]=[KKKTK]\begin{aligned} \mathbf{K}(X \cup X_\star,X \cup X_\star) & = \left[ \begin{array}{ccc|ccc} k(\vec{x}_1,\vec{x}_1) & \cdots & k(\vec{x}_1,\vec{x}_m) & k(\vec{x}_{1},\vec{x}_{\star 1}) & \cdots & k(\vec{x}_{1},\vec{x}_{\star n}) \\ \vdots & & \vdots & \vdots & & \vdots \\ (\vec{x}_m,\vec{x}_1) & \cdots & k(\vec{x}_m,\vec{x}_m) & k(\vec{x}_{m},\vec{x}_{\star 1}) & \cdots & k(\vec{x}_{m},\vec{x}_{\star n}) \\ \hline k(\vec{x}_{\star 1},\vec{x}_{1}) & \cdots & k(\vec{x}_{\star 1},\vec{x}_{m}) & k(\vec{x}_{\star 1},\vec{x}_{\star 1}) & \cdots & k(\vec{x}_{\star 1},\vec{x}_{\star n}) \\ \vdots & & \vdots & \vdots & & \vdots \\ k(\vec{x}_{\star n},\vec{x}_{1}) & \cdots & k(\vec{x}_{\star n},\vec{x}_{m}) & k(\vec{x}_{\star n},\vec{x}_{\star 1}) & \cdots & k(\vec{x}_{\star n},\vec{x}_{\star n}) \end{array} \right] \\ & =\left[ \begin{array}{c|c} \mathbf{K}(X,X) & \mathbf{K}(X,X_\star) \\ \hline \mathbf{K}(X_\star,X) & \mathbf{K}(X_\star,X_\star) \end{array} \right] \\ & =\left[ \begin{array}{c|c} \mathbf{K} & \mathbf{K}_\star \\ \hline \mathbf{K}_\star^T & \mathbf{K}_{\star \star} \end{array} \right] \end{aligned}

The Gaussian Process Model consists of an infinite collection of functions, all of which agree with the training/observation data. Importantly the collection has closed forms for 2nd order statistics (mean and variance). When used as a surrogate, the nominal value is chosen to be the mean value. The method can be broken down into two step: definition of the prior distribution then conditioning on observed data.

Gaussian processes

A Gaussian Process is a (potentially infinite) collection of random variables, such that the joint distribution of every finite selection of random variables from the collection is a Gaussian distribution.

GP(μ(x),k(x,x))\mathcal{GP}(\mu(\vec{x}),k(\vec{x},\vec{x'}))(1)

In an analogous way that a multivariate Gaussian is completely defined by its mean vector and its covariance matrix, a Gaussian Process is completely defined by its mean function and covariance function.

The (potentially) infinite number of random variables within the Gaussian Process correspond to the (potentially) infinite points in the parameter space our surrogate can be evaluated at.

Prior distribution:

We assume the observations (both training and testing) are pulled from an m+nm+n multivariate Gaussian distribution. The covariance matrix Σ\Sigma is the result of the choice of covariance function.

YYN(μ,Σ)Y \cup Y_\star \sim \mathcal{N}(\mu,\Sigma)

Note that μ\mu and Σ\Sigma are a vector and matrix respectively, and are a result of the mean and covariance functions applied to the sample points.

Zero Mean Assumption: Discussions of Gaussian Process are typically presented under assumption that μ=0\mu=0. This occurs without loss of generality since any sample can be made μ=0\mu=0 by subtracting the sample mean (or a variety of other preprocessing options). Note that in a training\testing paradigm, the testing data YY_\star is unknown, so determination of what to use as μ\mu is based on the information from the training data YY (or some other prior assumption).

Conditioning:

With the prior formed as above, conditioning on the available training data YY is performed. This alters the mean and variance to new values μ\mu_\star and Σ\Sigma_\star, restricting the set of possible functions which agree with the training data.

μ=μ+KK1(Yμ)Σ=KKTK1K\begin{aligned} \mu_\star &= \mu + \mathbf{K}_\star \mathbf{K}^{-1}(Y-\mu) \\ \Sigma_\star &= \mathbf{K}_{\star \star} - \mathbf{K}_\star^T \mathbf{K}^{-1} \mathbf{K}_\star \end{aligned}YN(μ,Σ)Y_\star \sim \mathcal{N}(\mu_\star ,\Sigma_\star)

When used as a surrogate, the nominal value is typically taken as the mean value, with diag(Σ)diag(\Sigma_\star) providing variances which can be used to generate confidence intervals.

Notes on Multi-Output Gaussian Processes (MOGPs)

MOGPs model and predict the outputs which are vectors, each of size MM. For any input vector xi\vec{x}_i, the vector of outputs yi\vec{y}_i and the matrix of NN vectors YY is defined as:

yi=[f(xi)1, f(xi)2,, f(xi)M]Y=[y1, y2,, yN] \begin{aligned} &\vec{y}_i = [f(\vec{x}_i)^1,~f(\vec{x}_i)^2,\dots,~f(\vec{x}_i)^M]^\intercal \\ &Y = [\vec{y}_1,~\vec{y}_2,\dots,~\vec{y}_N]^\intercal \\ \end{aligned}(2)

where yi\vec{y}_i is of size M×1M\times 1 and YY is of size N×MN\times M. The matrix YY is vectorized and represented as y^\hat{\mathbf{y}} with size NM×1NM \times 1. y^\hat{\mathbf{y}} is modeled as a Gaussian distribution defined as described in Eq. (1).

In a multi-output Gaussian Process, K\mathbf{K} captures covariances across the input variables and the vector of outputs and hence has size NM×NMNM \times NM. K\mathbf{K} can be modeled in several ways as discussed in (Liu et al., 2018; Alvarez et al., 2012). We will follow the linear model of coregionalization (LMC) which introduces latent functions with restrictons on the associated covariances.

Common Hyperparameters

While the only apparent decision in the above formulation is the choice of covariance function, most covariance functions will contain hyperparameters of some form which need to be selected in some manner. While each covariance function will have its own set of hyperparameters, a few hyperparameters of specific forms are present in many common covariance functions.

Length Factor \ell or \vec{\ell}

Frequently Kernels consider the distance between two input parameters x\vec{x} and x\vec{x}^\prime. For system of only a single parameter this distance often takes the form of

xx.\frac{|x - x^\prime|}{\ell}.

In this form the \ell factor set a relevant length scale for the distance measurements.

When multiple input parameters are to be considered, it may be advantageous to specify nn different length scales for each of the nn parameters, resulting in a vector \vec{\ell}. For example distance may be calculated as

i=1n(xixii)2.\sqrt{ \sum_{i=1}^n \left( \frac{x_i - x^\prime_i}{\ell_i} \right)^2}.

When used with standardized parameters, \ell can be interpreted in units of standard deviation for the relevant parameter.

Signal Variance σf2\sigma_f^2

This serves as an overall scaling parameter. Given a covariance function k~\tilde{k} (which is not a function of σf2\sigma_f^2), the multiplication of σf2\sigma_f^2 yields a new valid covariance function.

k(x,x,σf)=σf2k~(x,x)k(x,x^\prime,\sigma_f) = \sigma_f^2 \, \tilde{k}(x,x^\prime)

This multiplication can also be pulled out of the covariance matrix formation, and simply multiply the matrix formed by k~\tilde{k}

K(x,x,σf)=σf2K~(x,x)\mathbf{K}(x,x^\prime,\sigma_f) = \sigma_f^2 \, \tilde{\mathbf{K}}(x,x^\prime)

Noise Variance σn2\sigma_n^2

The σn2\sigma_n^2 represents noise in the collected data, and is as a additional σn2\sigma_n^2 factor on the variance terms (when x=xx=x^\prime).

k(x,x,σf,σn)=σf2k~(x,x)+σn2δx,xk(x,x^\prime,\sigma_f, \sigma_n) = \sigma_f^2 \, \tilde{k}(x,x^\prime) + \sigma_n^2 \, \delta_{x,x^\prime}

In the matrix representation this adds a factor of σn2\sigma_n^2 to diagonal of the noiseless matrix K~\tilde{\mathbf{K}}

K(x,x,σf,σn)=σf2K~(x,x)+σn2I\mathbf{K}(x,x^\prime,\sigma_f, \sigma_n) = \sigma_f^2 \, \tilde{\mathbf{K}}(x,x^\prime) + \sigma_n^2 \mathbf{I}

Due to the addition of σn2\sigma_n^2 along the diagonal of the KK matrix, this hyperparameter can aid in the inversion of the covariance matrix. For this reason adding a small amount of σn2\sigma_n^2 may be preferable, even when you believe the data to be noise free.

Selected Covariance Functions

Table 1: Selected Covariance Functions

Covariance FunctionDescription
SquaredExponentialCovarianceAlso referred to as a radial basis function (RBF) this is a widely used, general purpose covariance function. Serves as a common starting point for many. Used for single-output GPs
ExponentialCovarianceA simple exponential covariance function. Used for single-output GPs
MaternHalfIntCovarianceImplementation of the Matern class of covariance function, where the ν\nu parameter takes on half-integer values. Used for single-output GPs
LMCCovariance function built using the linear model of coregionalization. Used for multi-output GPs

Hyperparameter tuning options

The following options are available for tuning the hyperparameters:

  • Adaptive moment estimation (Adam)

    Relies on the pseudocode provided in Kingma and Ba (2014). Adam permits stochastic optimization, wherein, a batch of the training data can be randomly chosen at each iteration.

The hyper-parameters of the GPs are inferred by optimizing the log-likelihood function. The MOGP log-likelihood function has a general form:

L=12 lnK12 y^T K1 y^12 N ln(2π) \mathcal{L} = -\frac{1}{2}~\ln |\mathbf{K}| - \frac{1}{2}~\hat{\mathbf{y}}^T~\mathbf{K}^{-1}~\hat{\mathbf{y}}-\frac{1}{2}~N~\ln(2\pi)(3)

The optimization of GPs can be expensive. If there are NN training points each with MM outputs, each training iteration of Adam has a cost of O(M3N3)\mathcal{O} (M^3N^3). Adam permits using n<Nn < N random training points during each iteration (mini-batches) which has a cost of O(M3n3)<<O(M3N3)\mathcal{O}(M^3n^3)<<\mathcal{O}(M^3N^3).

Input Parameters

  • covariance_functionName of covariance function.

    C++ Type:UserObjectName

    Unit:(no unit assumed)

    Controllable:No

    Description:Name of covariance function.

  • responseReporter value of response results, can be vpp with / or sampler column with 'sampler/col_'.

    C++ Type:ReporterName

    Unit:(no unit assumed)

    Controllable:No

    Description:Reporter value of response results, can be vpp with / or sampler column with 'sampler/col_'.

  • samplerSampler used to create predictor and response data.

    C++ Type:SamplerName

    Unit:(no unit assumed)

    Controllable:No

    Description:Sampler used to create predictor and response data.

Required Parameters

  • batch_size0The batch size for Adam optimization

    Default:0

    C++ Type:unsigned int

    Unit:(no unit assumed)

    Controllable:No

    Description:The batch size for Adam optimization

  • converged_reporterReporter value used to determine if a sample's multiapp solve converged.

    C++ Type:ReporterName

    Unit:(no unit assumed)

    Controllable:No

    Description:Reporter value used to determine if a sample's multiapp solve converged.

  • cv_n_trials1Number of repeated trials of cross-validation to perform.

    Default:1

    C++ Type:unsigned int

    Unit:(no unit assumed)

    Controllable:No

    Description:Number of repeated trials of cross-validation to perform.

  • cv_seed4294967295Seed used to initialize random number generator for data splitting during cross validation.

    Default:4294967295

    C++ Type:unsigned int

    Unit:(no unit assumed)

    Controllable:No

    Description:Seed used to initialize random number generator for data splitting during cross validation.

  • cv_splits10Number of splits (k) to use in k-fold cross-validation.

    Default:10

    C++ Type:unsigned int

    Unit:(no unit assumed)

    Controllable:No

    Description:Number of splits (k) to use in k-fold cross-validation.

  • cv_surrogateName of Surrogate object used for model cross-validation.

    C++ Type:UserObjectName

    Unit:(no unit assumed)

    Controllable:No

    Description:Name of Surrogate object used for model cross-validation.

  • cv_typenoneCross-validation method to use for dataset. Options are 'none' or 'k_fold'.

    Default:none

    C++ Type:MooseEnum

    Unit:(no unit assumed)

    Options:none, k_fold

    Controllable:No

    Description:Cross-validation method to use for dataset. Options are 'none' or 'k_fold'.

  • execute_onTIMESTEP_ENDThe list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.

    Default:TIMESTEP_END

    C++ Type:ExecFlagEnum

    Unit:(no unit assumed)

    Options:NONE, INITIAL, LINEAR, NONLINEAR_CONVERGENCE, NONLINEAR, POSTCHECK, TIMESTEP_END, TIMESTEP_BEGIN, MULTIAPP_FIXED_POINT_END, MULTIAPP_FIXED_POINT_BEGIN, FINAL, CUSTOM

    Controllable:No

    Description:The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.

  • filenameThe name of the file which will be associated with the saved/loaded data.

    C++ Type:FileName

    Unit:(no unit assumed)

    Controllable:No

    Description:The name of the file which will be associated with the saved/loaded data.

  • learning_rate0.001The learning rate for Adam optimization

    Default:0.001

    C++ Type:double

    Unit:(no unit assumed)

    Controllable:No

    Description:The learning rate for Adam optimization

  • num_iters1000Tolerance value for Adam optimization

    Default:1000

    C++ Type:unsigned int

    Unit:(no unit assumed)

    Controllable:No

    Description:Tolerance value for Adam optimization

  • predictor_colsSampler columns used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

    C++ Type:std::vector<unsigned int>

    Unit:(no unit assumed)

    Controllable:No

    Description:Sampler columns used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

  • predictorsReporter values used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

    C++ Type:std::vector<ReporterName>

    Unit:(no unit assumed)

    Controllable:No

    Description:Reporter values used as the independent random variables, If 'predictors' and 'predictor_cols' are both empty, all sampler columns are used.

  • prop_getter_suffixAn optional suffix parameter that can be appended to any attempt to retrieve/get material properties. The suffix will be prepended with a '_' character.

    C++ Type:MaterialPropertyName

    Unit:(no unit assumed)

    Controllable:No

    Description:An optional suffix parameter that can be appended to any attempt to retrieve/get material properties. The suffix will be prepended with a '_' character.

  • response_typerealResponse data type.

    Default:real

    C++ Type:MooseEnum

    Unit:(no unit assumed)

    Options:real, vector_real

    Controllable:No

    Description:Response data type.

  • show_every_nth_iteration0Switch to show Adam optimization loss values at every nth step. If 0, nothing is showed.

    Default:0

    C++ Type:unsigned int

    Unit:(no unit assumed)

    Controllable:No

    Description:Switch to show Adam optimization loss values at every nth step. If 0, nothing is showed.

  • skip_unconverged_samplesFalseTrue to skip samples where the multiapp did not converge, 'stochastic_reporter' is required to do this.

    Default:False

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:True to skip samples where the multiapp did not converge, 'stochastic_reporter' is required to do this.

  • standardize_dataTrueStandardize (center and scale) training data (y values)

    Default:True

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:Standardize (center and scale) training data (y values)

  • standardize_paramsTrueStandardize (center and scale) training parameters (x values)

    Default:True

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:Standardize (center and scale) training parameters (x values)

  • tune_parametersSelect hyperparameters to be tuned

    C++ Type:std::vector<std::string>

    Unit:(no unit assumed)

    Controllable:No

    Description:Select hyperparameters to be tuned

  • tuning_maxMaximum allowable tuning value

    C++ Type:std::vector<double>

    Unit:(no unit assumed)

    Controllable:No

    Description:Maximum allowable tuning value

  • tuning_minMinimum allowable tuning value

    C++ Type:std::vector<double>

    Unit:(no unit assumed)

    Controllable:No

    Description:Minimum allowable tuning value

  • use_interpolated_stateFalseFor the old and older state use projected material properties interpolated at the quadrature points. To set up projection use the ProjectedStatefulMaterialStorageAction.

    Default:False

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:For the old and older state use projected material properties interpolated at the quadrature points. To set up projection use the ProjectedStatefulMaterialStorageAction.

Optional Parameters

  • allow_duplicate_execution_on_initialFalseIn the case where this UserObject is depended upon by an initial condition, allow it to be executed twice during the initial setup (once before the IC and again after mesh adaptivity (if applicable).

    Default:False

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:In the case where this UserObject is depended upon by an initial condition, allow it to be executed twice during the initial setup (once before the IC and again after mesh adaptivity (if applicable).

  • control_tagsAdds user-defined labels for accessing object parameters via control logic.

    C++ Type:std::vector<std::string>

    Unit:(no unit assumed)

    Controllable:No

    Description:Adds user-defined labels for accessing object parameters via control logic.

  • enableTrueSet the enabled status of the MooseObject.

    Default:True

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:Yes

    Description:Set the enabled status of the MooseObject.

  • execution_order_group0Execution order groups are executed in increasing order (e.g., the lowest number is executed first). Note that negative group numbers may be used to execute groups before the default (0) group. Please refer to the user object documentation for ordering of user object execution within a group.

    Default:0

    C++ Type:int

    Unit:(no unit assumed)

    Controllable:No

    Description:Execution order groups are executed in increasing order (e.g., the lowest number is executed first). Note that negative group numbers may be used to execute groups before the default (0) group. Please refer to the user object documentation for ordering of user object execution within a group.

  • force_postauxFalseForces the UserObject to be executed in POSTAUX

    Default:False

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:Forces the UserObject to be executed in POSTAUX

  • force_preauxFalseForces the UserObject to be executed in PREAUX

    Default:False

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:Forces the UserObject to be executed in PREAUX

  • force_preicFalseForces the UserObject to be executed in PREIC during initial setup

    Default:False

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:Forces the UserObject to be executed in PREIC during initial setup

  • use_displaced_meshFalseWhether or not this object should use the displaced mesh for computation. Note that in the case this is true but no displacements are provided in the Mesh block the undisplaced mesh will still be used.

    Default:False

    C++ Type:bool

    Unit:(no unit assumed)

    Controllable:No

    Description:Whether or not this object should use the displaced mesh for computation. Note that in the case this is true but no displacements are provided in the Mesh block the undisplaced mesh will still be used.

Advanced Parameters

References

  1. M. A. Alvarez, L. Rosasco, and N. D. Lawrence. Kernels for vector-valued functions: a review. Foundations and Trends in Machine Learning, 4(3):195–266, 2012.[BibTeX]
  2. D. P. Kingma and J. Ba. Adam: a method for stochastic optimization. arXiv:1412.6980 [cs.LG], 2014. URL: https://doi.org/10.48550/arXiv.1412.6980.[BibTeX]
  3. H. Liu, J. Cai, and Y. S. Ong. Remarks on multi-output gaussian process regression. Knowledge-Based Systems, 144:102–112, 2018.[BibTeX]
  4. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005. ISBN 026218253X.[BibTeX]