Creating a Surrogate Model

This example goes through the process of creating a custom surrogate model, in his case the creation of NearestPointSurrogate.

Overview

Building a surrogate model requires the creation of two objects: SurrogateTrainer and SurrogateModel. The SurrogateTrainer uses information from samplers and results to construct variables to be saved into a .rd file at the conclusion of the training run. The SurrogateModel object loads the data from the .rd and contains a function called evaluate that evaluates the surrogate model at a given input. The SurrogateTrainer and Surrogate are heavily tied together where each have the same member variables, the difference being one saves the data and the other loads it. It might be beneficial to have an interface class that contains common functions for training and evaluating, to avoid duplicate code. This example will not go into the creation of this interface class.

Creating a Trainer

This example will go over the creation of NearestPointTrainer. Trainers are derived from SurrogateTrainer which performs a loop over the training data and calls virtual functions that derived classes are meant to override to perform the proper training.

This example makes use of SurrogateTrainer input parameters, API functions, and protected member variables for gathering and accessing common types of training data. For further details on these features, see Trainers System.

validParams

The trainer requires the input of a sampler, so that it understands how many data points are included and how they are distributed across processors. The trainer also needs the predictor and response values from the full-order model which are stored in a vector postprocessor or reporter.

InputParameters
SurrogateTrainer::validParams()
{
  InputParameters params = SurrogateTrainerBase::validParams();
  params.addRequiredParam<SamplerName>("sampler",
                                       "Sampler used to create predictor and response data.");
  params.addParam<ReporterName>(
      "converged_reporter",
      "Reporter value used to determine if a sample's multiapp solve converged.");
  params.addParam<bool>("skip_unconverged_samples",
                        false,
                        "True to skip samples where the multiapp did not converge, "
                        "'stochastic_reporter' is required to do this.");

  // Common Training Data
  MooseEnum data_type("real=0 vector_real=1", "real");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<MooseEnum>("response_type", data_type, "Response data type.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  // End Common Training Data

  MooseEnum cv_type("none=0 k_fold=1", "none");
  params.addParam<MooseEnum>(
      "cv_type",
      cv_type,
      "Cross-validation method to use for dataset. Options are 'none' or 'k_fold'.");
  params.addRangeCheckedParam<unsigned int>(
      "cv_splits", 10, "cv_splits > 1", "Number of splits (k) to use in k-fold cross-validation.");
  params.addParam<UserObjectName>("cv_surrogate",
                                  "Name of Surrogate object used for model cross-validation.");
  params.addParam<unsigned int>(
      "cv_n_trials", 1, "Number of repeated trials of cross-validation to perform.");
  params.addParam<unsigned int>("cv_seed",
                                std::numeric_limits<unsigned int>::max(),
                                "Seed used to initialize random number generator for data "
                                "splitting during cross validation.");

  return params;
}

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");

  return params;
}

By default, classes inheriting from SurrogateTrainer support Real and std::vector<Real> response types. For responses of these types, a Reporter can be specified using the "response" and "response_type" input parameters. Additionally, child classes can support predictor data as a combination of Sampler columns as well as Reporter values - these options can be controlled using the "response" and "response_type" input parameters.

Classes derived from SurrogateTrainer also support $k$ -fold cross-validation. For details on the input parameters for this capability, see Cross Validation. To ensure compatibility with these features, some extra considerations are needed during implementation of a new Trainer. These will be discussed in the following sections.

Constructor

All trainers are based on SurrogateTrainer, which provides the necessary interface for saving the surrogate model data and gathering response/predictor. Any additional data meant to be saved and gathered is defined in the constructor of the training object. In NearestPointTrainer, the variables _sample_points and _sample_results are declared as the necessary surrogate data, see Trainers for more information on declaring model data. Because we will use several default inputs to retrieve training data, we only need to resize these variables for the number of dimensions in the training data. Each processor will contain a portion of the samples and results. We will gather all the samples in postTrain().

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(declareModelData<std::vector<std::vector<Real>>>("_sample_results")),
    _predictor_row(getPredictorData())
{
  _sample_points.resize(_n_dims);
  _sample_results.resize(1);
}

The member variables _sample_points, _sample_results, and _predictor_row are defined in the header file:

/// Map containing sample points and the results
std::vector<std::vector<Real>> & _sample_points;

/// Container for results (y values).
std::vector<std::vector<Real>> & _sample_results;

/// Data from the current predictor row
const std::vector<Real> & _predictor_row;

preTrain

preTrain() is called before the sampler loop. This is typically used to initialize variables and allocate memory. For NearestPointTrainer, we will explicitly clear _sample_points and _sample_results. This is done because the implementation of $k$ -fold cross-validation used in SurrogateTrainer requires that preTrain() resets the state of the trainer and clears any essential data related to prior training sets (for further details, see Trainers System).

void
NearestPointTrainer::preTrain()
{
  for (auto & it : _sample_points)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }

  for (auto & it : _sample_results)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }
}

train

train() is where the actual training occurs. This function is called during the sampler loop for each row, at which time the member variables _row, _local_row, and ones gathered with getTrainingData are updated. In NearestPointTrainer, we will push_back predictor and response data as appropriate. Using push_back is convenient because some samples may be skipped (due to non-convergence, or intentionally during cross-validation).

void
NearestPointTrainer::train()
{
  if (_rvecval && (_sample_results.size() != _rvecval->size()))
    _sample_results.resize(_rvecval->size());

  // Get predictors from reporter values
  for (auto d : make_range(_n_dims))
    _sample_points[d].push_back(_predictor_row[d]);

  // Get responses
  if (_rval)
    _sample_results[0].push_back(*_rval);
  else if (_rvecval)
    for (auto r : make_range(_rvecval->size()))
      _sample_results[r].push_back((*_rvecval)[r]);
}

postTrain

postTrain() is called after the sampler loop. This is typically where processor communication happens. Here, we use postTrain() to gather all the local _sample_points so that each processor has the full copy. _communicator.allgather makes it so that every processor has a copy of the full array and _communicator.gather makes it so that only one of the processors has the full copy, the latter is typically used because outputting only happens on the root processor. See libMesh::Parallel::Communicator for more communication options.

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);

  for (auto & it : _sample_results)
    _communicator.allgather(it);
}

Creating a Surrogate

This example will go over the creation of NearestPointSurrogate. Surrogates are a specialized version of a MooseObject that must have the evaluate public member function. The validParams for a surrogate will generally define how the surrogate is evaluated. NearestPointSurrogate does not have any options for the method of evaluation.

Constructor

In the constructor, the references for the model data are defined, taken from the training data:

NearestPointSurrogate::NearestPointSurrogate(const InputParameters & parameters)
  : SurrogateModel(parameters),
    _sample_points(getModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(getModelData<std::vector<std::vector<Real>>>("_sample_results"))
{
}

See Surrogates for more information on the getModelData function. _sample_points in the surrogate is a const reference, since we do not want to modify the training data during evaluation:

/// Array containing sample points
const std::vector<std::vector<Real>> & _sample_points;

evaluate

evaluate is a public member function required for all surrogate models. This is where surrogate model is actually used. evaluate takes in parameter values and returns the surrogate's estimation of the quantity of interest. See EvaluateSurrogate for an example on how the evaluate function is used.

Real
NearestPointSurrogate::evaluate(const std::vector<Real> & x) const
{
  // Check whether input point has same dimensionality as training data
  mooseAssert(_sample_points.size() == x.size(),
              "Input point does not match dimensionality of training data.");

  return _sample_results[0][findNearestPoint(x)];
}

unsigned int
NearestPointSurrogate::findNearestPoint(const std::vector<Real> & x) const
{
  unsigned int idx = 0;

  // Container of current minimum distance during training sample loop
  Real dist_min = std::numeric_limits<Real>::max();

  for (dof_id_type p = 0; p < _sample_points[0].size(); ++p)
  {
    // Sum over the distance of each point dimension
    Real dist = 0;
    for (unsigned int i = 0; i < x.size(); ++i)
    {
      Real diff = (x[i] - _sample_points[i][p]);
      dist += diff * diff;
    }

    // Check if this training point distance is smaller than the current minimum
    if (dist < dist_min)
    {
      idx = p;
      dist_min = dist;
    }
  }
  return idx;
}

(contrib/moose/modules/stochastic_tools/src/trainers/SurrogateTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "SurrogateTrainer.h"
#include "SurrogateModel.h"
#include "Sampler.h"
#include "StochasticToolsApp.h"
#include "MooseRandom.h"
#include "Shuffle.h"
#include <algorithm>

InputParameters
SurrogateTrainerBase::validParams()
{
  InputParameters params = GeneralUserObject::validParams();
  params += RestartableModelInterface::validParams();
  params.registerBase("SurrogateTrainer");
  return params;
}

SurrogateTrainerBase::SurrogateTrainerBase(const InputParameters & parameters)
  : GeneralUserObject(parameters),
    RestartableModelInterface(*this, /*read_only=*/false, _type + "_" + name())
{
}

InputParameters
SurrogateTrainer::validParams()
{
  InputParameters params = SurrogateTrainerBase::validParams();
  params.addRequiredParam<SamplerName>("sampler",
                                       "Sampler used to create predictor and response data.");
  params.addParam<ReporterName>(
      "converged_reporter",
      "Reporter value used to determine if a sample's multiapp solve converged.");
  params.addParam<bool>("skip_unconverged_samples",
                        false,
                        "True to skip samples where the multiapp did not converge, "
                        "'stochastic_reporter' is required to do this.");

  // Common Training Data
  MooseEnum data_type("real=0 vector_real=1", "real");
  params.addRequiredParam<ReporterName>(
      "response",
      "Reporter value of response results, can be vpp with <vpp_name>/<vector_name> or sampler "
      "column with 'sampler/col_<index>'.");
  params.addParam<MooseEnum>("response_type", data_type, "Response data type.");
  params.addParam<std::vector<ReporterName>>(
      "predictors",
      std::vector<ReporterName>(),
      "Reporter values used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  params.addParam<std::vector<unsigned int>>(
      "predictor_cols",
      std::vector<unsigned int>(),
      "Sampler columns used as the independent random variables, If 'predictors' and "
      "'predictor_cols' are both empty, all sampler columns are used.");
  // End Common Training Data

  MooseEnum cv_type("none=0 k_fold=1", "none");
  params.addParam<MooseEnum>(
      "cv_type",
      cv_type,
      "Cross-validation method to use for dataset. Options are 'none' or 'k_fold'.");
  params.addRangeCheckedParam<unsigned int>(
      "cv_splits", 10, "cv_splits > 1", "Number of splits (k) to use in k-fold cross-validation.");
  params.addParam<UserObjectName>("cv_surrogate",
                                  "Name of Surrogate object used for model cross-validation.");
  params.addParam<unsigned int>(
      "cv_n_trials", 1, "Number of repeated trials of cross-validation to perform.");
  params.addParam<unsigned int>("cv_seed",
                                std::numeric_limits<unsigned int>::max(),
                                "Seed used to initialize random number generator for data "
                                "splitting during cross validation.");

  return params;
}

SurrogateTrainer::SurrogateTrainer(const InputParameters & parameters)
  : SurrogateTrainerBase(parameters),
    SurrogateModelInterface(this),
    _sampler(getSampler("sampler")),
    _rval(nullptr),
    _rvecval(nullptr),
    _pvals(getParam<std::vector<ReporterName>>("predictors").size()),
    _pcols(getParam<std::vector<unsigned int>>("predictor_cols")),
    _n_outputs(declareModelData<unsigned int>("_n_outputs", 1)),
    _row_data(_sampler.getNumberOfCols()),
    _skip_unconverged(getParam<bool>("skip_unconverged_samples")),
    _converged(nullptr),
    _cv_type(getParam<MooseEnum>("cv_type")),
    _n_splits(getParam<unsigned int>("cv_splits")),
    _cv_n_trials(getParam<unsigned int>("cv_n_trials")),
    _cv_seed(getParam<unsigned int>("cv_seed")),
    _doing_cv(_cv_type != "none"),
    _cv_trial_scores(declareModelData<std::vector<std::vector<Real>>>("cv_scores"))
{
  if (_skip_unconverged)
  {
    if (!isParamValid("converged_reporter"))
      paramError("skip_unconverged_samples",
                 "'converged_reporter' needs to be specified to skip unconverged sample.");
    _converged = &getTrainingData<bool>(getParam<ReporterName>("converged_reporter"));
  }

  if (_doing_cv)
  {
    if (!isParamValid("cv_surrogate"))
      paramError("cv_type",
                 "To perform cross-validation, the option cv_surrogate needs to be specified",
                 " to provide a Surrogate object for training and evaluation.");

    if (_n_splits > _sampler.getNumberOfRows())
      paramError("cv_splits",
                 "The specified number of splits (cv_splits = ",
                 _n_splits,
                 ")",
                 " exceeds the number of rows in Sampler '",
                 getParam<SamplerName>("sampler"),
                 "'");

    _cv_generator.seed(0, _cv_seed);
  }

  // Get TrainingData for responses and predictors
  if (getParam<MooseEnum>("response_type") == 0)
    _rval = &getTrainingData<Real>(getParam<ReporterName>("response"));
  else if (getParam<MooseEnum>("response_type") == 1)
    _rvecval = &getTrainingData<std::vector<Real>>(getParam<ReporterName>("response"));

  const auto & pnames = getParam<std::vector<ReporterName>>("predictors");
  for (unsigned int i = 0; i < pnames.size(); ++i)
    _pvals[i] = &getTrainingData<Real>(pnames[i]);

  // If predictors and predictor_cols are empty, use all sampler columns
  if (_pvals.empty() && _pcols.empty())
  {
    _pcols.resize(_sampler.getNumberOfCols());
    std::iota(_pcols.begin(), _pcols.end(), 0);
  }
  _n_dims = _pvals.size() + _pcols.size();

  _predictor_data.resize(_n_dims);
}

void
SurrogateTrainer::initialize()
{
  // Figure out if data is distributed
  for (auto & pair : _training_data)
  {
    const ReporterName & name = pair.first;
    TrainingDataBase & data = *pair.second;

    const auto & mode = _fe_problem.getReporterData().getReporterMode(name);
    if (mode == REPORTER_MODE_DISTRIBUTED || (mode == REPORTER_MODE_ROOT && processor_id() != 0))
      data.isDistributed() = true;
    else if (mode == REPORTER_MODE_REPLICATED ||
             (mode == REPORTER_MODE_ROOT && processor_id() == 0))
      data.isDistributed() = false;
    else
      mooseError("Predictor reporter value ", name, " is not of supported mode.");
  }

  if (_doing_cv)
    _cv_surrogate = &getSurrogateModel("cv_surrogate");
}

void
SurrogateTrainer::execute()
{
  if (_doing_cv)
    for (const auto & trial : make_range(_cv_n_trials))
    {
      std::vector<Real> trial_score = crossValidate();

      // Expand _cv_trial_scores with more columns if necessary, then insert values.
      for (unsigned int r = _cv_trial_scores.size(); r < trial_score.size(); ++r)
        _cv_trial_scores.push_back(std::vector<Real>(_cv_n_trials, 0.0));
      for (auto r : make_range(trial_score.size()))
        _cv_trial_scores[r][trial] = trial_score[r];
    }

  _current_sample_size = _sampler.getNumberOfRows();
  _local_sample_size = _sampler.getNumberOfLocalRows();
  executeTraining();
}

void
SurrogateTrainer::checkIntegrity() const
{
  // Check that the number of sampler columns hasn't changed
  if (_row_data.size() != _sampler.getNumberOfCols())
    mooseError("Number of sampler columns has changed.");

  // Check that training data is correctly sized
  for (auto & pair : _training_data)
  {
    dof_id_type rsize = pair.second->size();
    dof_id_type nrow =
        pair.second->isDistributed() ? _sampler.getNumberOfLocalRows() : _sampler.getNumberOfRows();
    if (rsize != nrow)
      mooseError("Reporter value ",
                 pair.first,
                 " of size ",
                 rsize,
                 " does not match sampler size (",
                 nrow,
                 ").");
  }
}

void
SurrogateTrainer::executeTraining()
{
  checkIntegrity();
  _row = _sampler.getLocalRowBegin();
  _local_row = 0;

  preTrain();

  for (_row = _sampler.getLocalRowBegin(); _row < _sampler.getLocalRowEnd(); ++_row)
  {
    // Need to do this manually in order to keep the iterators valid
    const std::vector<Real> data = _sampler.getNextLocalRow();
    for (unsigned int i = 0; i < _row_data.size(); ++i)
      _row_data[i] = data[i];

    // Set training data
    for (auto & pair : _training_data)
      pair.second->setCurrentIndex((pair.second->isDistributed() ? _local_row : _row));

    updatePredictorRow();

    if ((!_skip_unconverged || *_converged) &&
        std::find(_skip_indices.begin(), _skip_indices.end(), _row) == _skip_indices.end())
      train();

    _local_row++;
  }

  postTrain();
}

std::vector<Real>
SurrogateTrainer::crossValidate()
{
  std::vector<Real> cv_score(1, 0.0);

  // Get skipped indices for each split
  dof_id_type n_rows = _sampler.getNumberOfRows();
  std::vector<std::vector<dof_id_type>> split_indices;
  if (processor_id() == 0)
  {
    std::vector<dof_id_type> indices_flat(n_rows);
    std::iota(indices_flat.begin(), indices_flat.end(), 0);
    MooseUtils::shuffle(indices_flat, _cv_generator, 0);

    split_indices.resize(_n_splits);
    for (const auto & k : make_range(_n_splits))
    {
      const dof_id_type num_ind = n_rows / _n_splits + (k < (n_rows % _n_splits) ? 1 : 0);
      split_indices[k].insert(split_indices[k].begin(),
                              std::make_move_iterator(indices_flat.begin()),
                              std::make_move_iterator(indices_flat.begin() + num_ind));
      std::sort(split_indices[k].begin(), split_indices[k].end());
      indices_flat.erase(indices_flat.begin(), indices_flat.begin() + num_ind);
    }
  }

  std::vector<dof_id_type> split_ids_buffer;
  for (const auto & k : make_range(_n_splits))
  {
    if (processor_id() == 0)
      split_ids_buffer = split_indices[k];
    _communicator.broadcast(split_ids_buffer, 0);

    _current_sample_size = _sampler.getNumberOfRows() - split_ids_buffer.size();

    auto first = std::lower_bound(
        split_ids_buffer.begin(), split_ids_buffer.end(), _sampler.getLocalRowBegin());
    auto last = std::upper_bound(
        split_ids_buffer.begin(), split_ids_buffer.end(), _sampler.getLocalRowEnd());
    _skip_indices.insert(_skip_indices.begin(), first, last);

    _local_sample_size = _sampler.getNumberOfLocalRows() - _skip_indices.size();

    // Train the model
    executeTraining();

    // Evaluate the model
    std::vector<Real> split_mse(1, 0.0);
    std::vector<Real> row_mse(1, 0.0);

    auto skipped_row = _skip_indices.begin();

    for (dof_id_type p = _sampler.getLocalRowBegin(); p < _sampler.getLocalRowEnd(); ++p)
    {
      const std::vector<Real> row = _sampler.getNextLocalRow();
      if (skipped_row != _skip_indices.end() && p == *skipped_row)
      {
        for (unsigned int i = 0; i < _row_data.size(); ++i)
          _row_data[i] = row[i];

        for (auto & pair : _training_data)
          pair.second->setCurrentIndex(
              (pair.second->isDistributed() ? p - _sampler.getLocalRowBegin() : p));

        updatePredictorRow();

        row_mse = evaluateModelError(*_cv_surrogate);

        // Expand split_mse if needed.
        split_mse.resize(row_mse.size(), 0.0);

        // Increment errors
        for (unsigned int r = 0; r < split_mse.size(); ++r)
          split_mse[r] += row_mse[r];

        skipped_row++;
      }
    }
    gatherSum(split_mse);

    // Expand cv_score if necessary.
    cv_score.resize(split_mse.size(), 0.0);

    for (auto r : make_range(split_mse.size()))
      cv_score[r] += split_mse[r] / n_rows;

    _skip_indices.clear();
  }

  for (auto r : make_range(cv_score.size()))
    cv_score[r] = std::sqrt(cv_score[r]);

  return cv_score;
}

std::vector<Real>
SurrogateTrainer::evaluateModelError(const SurrogateModel & surr)
{
  std::vector<Real> error(1, 0.0);

  if (_rval)
  {
    Real model_eval = surr.evaluate(_predictor_data);
    error[0] = MathUtils::pow(model_eval - (*_rval), 2);
  }
  else if (_rvecval)
  {
    error.resize(_rvecval->size());

    // Evaluate for vector response.
    std::vector<Real> model_eval(error.size());
    surr.evaluate(_predictor_data, model_eval);
    for (auto r : make_range(_rvecval->size()))
      error[r] = MathUtils::pow(model_eval[r] - (*_rvecval)[r], 2);
  }

  return error;
}

void
SurrogateTrainer::updatePredictorRow()
{
  unsigned int d = 0;
  for (const auto & val : _pvals)
    _predictor_data[d++] = *val;
  for (const auto & col : _pcols)
    _predictor_data[d++] = _row_data[col];
}

(contrib/moose/modules/stochastic_tools/src/trainers/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(declareModelData<std::vector<std::vector<Real>>>("_sample_results")),
    _predictor_row(getPredictorData())
{
  _sample_points.resize(_n_dims);
  _sample_results.resize(1);
}

void
NearestPointTrainer::preTrain()
{
  for (auto & it : _sample_points)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }

  for (auto & it : _sample_results)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }
}

void
NearestPointTrainer::train()
{
  if (_rvecval && (_sample_results.size() != _rvecval->size()))
    _sample_results.resize(_rvecval->size());

  // Get predictors from reporter values
  for (auto d : make_range(_n_dims))
    _sample_points[d].push_back(_predictor_row[d]);

  // Get responses
  if (_rval)
    _sample_results[0].push_back(*_rval);
  else if (_rvecval)
    for (auto r : make_range(_rvecval->size()))
      _sample_results[r].push_back((*_rvecval)[r]);
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);

  for (auto & it : _sample_results)
    _communicator.allgather(it);
}

(contrib/moose/modules/stochastic_tools/src/trainers/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(declareModelData<std::vector<std::vector<Real>>>("_sample_results")),
    _predictor_row(getPredictorData())
{
  _sample_points.resize(_n_dims);
  _sample_results.resize(1);
}

void
NearestPointTrainer::preTrain()
{
  for (auto & it : _sample_points)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }

  for (auto & it : _sample_results)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }
}

void
NearestPointTrainer::train()
{
  if (_rvecval && (_sample_results.size() != _rvecval->size()))
    _sample_results.resize(_rvecval->size());

  // Get predictors from reporter values
  for (auto d : make_range(_n_dims))
    _sample_points[d].push_back(_predictor_row[d]);

  // Get responses
  if (_rval)
    _sample_results[0].push_back(*_rval);
  else if (_rvecval)
    for (auto r : make_range(_rvecval->size()))
      _sample_results[r].push_back((*_rvecval)[r]);
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);

  for (auto & it : _sample_results)
    _communicator.allgather(it);
}

(contrib/moose/modules/stochastic_tools/include/trainers/NearestPointTrainer.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "SurrogateTrainer.h"

class NearestPointTrainer : public SurrogateTrainer
{
public:
  static InputParameters validParams();
  NearestPointTrainer(const InputParameters & parameters);
  virtual void preTrain() override;
  virtual void train() override;
  virtual void postTrain() override;

protected:
  /// Map containing sample points and the results
  std::vector<std::vector<Real>> & _sample_points;

  /// Container for results (y values).
  std::vector<std::vector<Real>> & _sample_results;

  /// Data from the current predictor row
  const std::vector<Real> & _predictor_row;
};

(contrib/moose/modules/stochastic_tools/src/trainers/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(declareModelData<std::vector<std::vector<Real>>>("_sample_results")),
    _predictor_row(getPredictorData())
{
  _sample_points.resize(_n_dims);
  _sample_results.resize(1);
}

void
NearestPointTrainer::preTrain()
{
  for (auto & it : _sample_points)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }

  for (auto & it : _sample_results)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }
}

void
NearestPointTrainer::train()
{
  if (_rvecval && (_sample_results.size() != _rvecval->size()))
    _sample_results.resize(_rvecval->size());

  // Get predictors from reporter values
  for (auto d : make_range(_n_dims))
    _sample_points[d].push_back(_predictor_row[d]);

  // Get responses
  if (_rval)
    _sample_results[0].push_back(*_rval);
  else if (_rvecval)
    for (auto r : make_range(_rvecval->size()))
      _sample_results[r].push_back((*_rvecval)[r]);
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);

  for (auto & it : _sample_results)
    _communicator.allgather(it);
}

(contrib/moose/modules/stochastic_tools/src/trainers/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(declareModelData<std::vector<std::vector<Real>>>("_sample_results")),
    _predictor_row(getPredictorData())
{
  _sample_points.resize(_n_dims);
  _sample_results.resize(1);
}

void
NearestPointTrainer::preTrain()
{
  for (auto & it : _sample_points)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }

  for (auto & it : _sample_results)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }
}

void
NearestPointTrainer::train()
{
  if (_rvecval && (_sample_results.size() != _rvecval->size()))
    _sample_results.resize(_rvecval->size());

  // Get predictors from reporter values
  for (auto d : make_range(_n_dims))
    _sample_points[d].push_back(_predictor_row[d]);

  // Get responses
  if (_rval)
    _sample_results[0].push_back(*_rval);
  else if (_rvecval)
    for (auto r : make_range(_rvecval->size()))
      _sample_results[r].push_back((*_rvecval)[r]);
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);

  for (auto & it : _sample_results)
    _communicator.allgather(it);
}

(contrib/moose/modules/stochastic_tools/src/trainers/NearestPointTrainer.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointTrainer.h"
#include "Sampler.h"

registerMooseObject("StochasticToolsApp", NearestPointTrainer);

InputParameters
NearestPointTrainer::validParams()
{
  InputParameters params = SurrogateTrainer::validParams();
  params.addClassDescription("Loops over and saves sample values for [NearestPointSurrogate.md].");

  return params;
}

NearestPointTrainer::NearestPointTrainer(const InputParameters & parameters)
  : SurrogateTrainer(parameters),
    _sample_points(declareModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(declareModelData<std::vector<std::vector<Real>>>("_sample_results")),
    _predictor_row(getPredictorData())
{
  _sample_points.resize(_n_dims);
  _sample_results.resize(1);
}

void
NearestPointTrainer::preTrain()
{
  for (auto & it : _sample_points)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }

  for (auto & it : _sample_results)
  {
    it.clear();
    it.reserve(getLocalSampleSize());
  }
}

void
NearestPointTrainer::train()
{
  if (_rvecval && (_sample_results.size() != _rvecval->size()))
    _sample_results.resize(_rvecval->size());

  // Get predictors from reporter values
  for (auto d : make_range(_n_dims))
    _sample_points[d].push_back(_predictor_row[d]);

  // Get responses
  if (_rval)
    _sample_results[0].push_back(*_rval);
  else if (_rvecval)
    for (auto r : make_range(_rvecval->size()))
      _sample_results[r].push_back((*_rvecval)[r]);
}

void
NearestPointTrainer::postTrain()
{
  for (auto & it : _sample_points)
    _communicator.allgather(it);

  for (auto & it : _sample_results)
    _communicator.allgather(it);
}

(contrib/moose/modules/stochastic_tools/src/surrogates/NearestPointSurrogate.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointSurrogate.h"

registerMooseObject("StochasticToolsApp", NearestPointSurrogate);

InputParameters
NearestPointSurrogate::validParams()
{
  InputParameters params = SurrogateModel::validParams();
  params.addClassDescription("Surrogate that evaluates the value from the nearest point from data "
                             "in [NearestPointTrainer.md]");
  return params;
}

NearestPointSurrogate::NearestPointSurrogate(const InputParameters & parameters)
  : SurrogateModel(parameters),
    _sample_points(getModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(getModelData<std::vector<std::vector<Real>>>("_sample_results"))
{
}

Real
NearestPointSurrogate::evaluate(const std::vector<Real> & x) const
{
  // Check whether input point has same dimensionality as training data
  mooseAssert(_sample_points.size() == x.size(),
              "Input point does not match dimensionality of training data.");

  return _sample_results[0][findNearestPoint(x)];
}

void
NearestPointSurrogate::evaluate(const std::vector<Real> & x, std::vector<Real> & y) const
{
  mooseAssert(_sample_points.size() == x.size(),
              "Input point does not match dimensionality of training data.");

  y.assign(_sample_results.size(), 0.0);

  unsigned int idx = findNearestPoint(x);

  for (const auto & r : index_range(y))
    y[r] = _sample_results[r][idx];
}

unsigned int
NearestPointSurrogate::findNearestPoint(const std::vector<Real> & x) const
{
  unsigned int idx = 0;

  // Container of current minimum distance during training sample loop
  Real dist_min = std::numeric_limits<Real>::max();

  for (dof_id_type p = 0; p < _sample_points[0].size(); ++p)
  {
    // Sum over the distance of each point dimension
    Real dist = 0;
    for (unsigned int i = 0; i < x.size(); ++i)
    {
      Real diff = (x[i] - _sample_points[i][p]);
      dist += diff * diff;
    }

    // Check if this training point distance is smaller than the current minimum
    if (dist < dist_min)
    {
      idx = p;
      dist_min = dist;
    }
  }
  return idx;
}

(contrib/moose/modules/stochastic_tools/include/surrogates/NearestPointSurrogate.h)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#pragma once

#include "SurrogateModel.h"

class NearestPointSurrogate : public SurrogateModel
{
public:
  static InputParameters validParams();
  NearestPointSurrogate(const InputParameters & parameters);
  using SurrogateModel::evaluate;
  virtual Real evaluate(const std::vector<Real> & x) const override;
  virtual void evaluate(const std::vector<Real> & x, std::vector<Real> & y) const override;

protected:
  /// Array containing sample points
  const std::vector<std::vector<Real>> & _sample_points;

  /// Array containing results
  const std::vector<std::vector<Real>> & _sample_results;

private:
  unsigned int findNearestPoint(const std::vector<Real> & x) const;
};

(contrib/moose/modules/stochastic_tools/src/reporters/EvaluateSurrogate.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

// Stocastic Tools Includes
#include "EvaluateSurrogate.h"

#include "Sampler.h"

registerMooseObject("StochasticToolsApp", EvaluateSurrogate);

InputParameters
EvaluateSurrogate::validParams()
{
  InputParameters params = StochasticReporter::validParams();
  params += SurrogateModelInterface::validParams();
  params += SamplerInterface::validParams();
  params.addClassDescription("Tool for sampling surrogate models.");
  params.addRequiredParam<std::vector<UserObjectName>>("model", "Name of surrogate models.");
  params.addRequiredParam<SamplerName>("sampler",
                                       "Sampler to use for evaluating surrogate models.");
  MultiMooseEnum rtypes(SurrogateModel::defaultResponseTypes().getRawNames(), "real");
  params.addParam<MultiMooseEnum>(
      "response_type",
      rtypes,
      "The type of return value expected from the surrogate models, a single entry will use it for "
      "every model. Warning: not every model is able evaluate every response type.");
  MultiMooseEnum estd("false=0 true=1", "false");
  params.addParam<MultiMooseEnum>(
      "evaluate_std",
      estd,
      "Whether or not to evaluate standard deviation associated with each sample, a single entry "
      "will use it for every model. Warning: not every model can compute standard deviation.");
  return params;
}

EvaluateSurrogate::EvaluateSurrogate(const InputParameters & parameters)
  : StochasticReporter(parameters),
    SurrogateModelInterface(this),
    _sampler(getSampler("sampler")),
    _response_types(getParam<MultiMooseEnum>("response_type"))
{
  const auto & model_names = getParam<std::vector<UserObjectName>>("model");
  _model.reserve(model_names.size());
  for (const auto & nm : model_names)
    _model.push_back(&getSurrogateModelByName(nm));

  if (_response_types.size() != 1 && _response_types.size() != _model.size())
    paramError("response_type",
               "Number of entries must be 1 or equal to the number of entries in 'model'.");

  const auto & estd = getParam<MultiMooseEnum>("evaluate_std");
  if (estd.size() != 1 && estd.size() != _model.size())
    paramError("evaluate_std",
               "Nmber of entries must be 1 or equal to the number of entries in 'model'.");
  _doing_std.resize(_model.size());
  for (const auto i : index_range(_model))
    _doing_std[i] = estd.size() == 1 ? estd[0] == "true" : estd[i] == "true";

  _real_values.resize(_model.size(), nullptr);
  _real_std.resize(_model.size(), nullptr);
  _vector_real_values.resize(_model.size(), nullptr);
  _vector_real_std.resize(_model.size(), nullptr);
  for (const auto i : index_range(_model))
  {
    const std::string rtype = _response_types.size() == 1 ? _response_types[0] : _response_types[i];
    if (rtype == "real")
    {
      _real_values[i] = &declareStochasticReporter<Real>(model_names[i], _sampler);
      if (_doing_std[i])
        _real_std[i] = &declareStochasticReporter<Real>(model_names[i] + "_std", _sampler);
    }
    else if (rtype == "vector_real")
    {
      _vector_real_values[i] =
          &declareStochasticReporter<std::vector<Real>>(model_names[i], _sampler);
      if (_doing_std[i])
        _vector_real_std[i] =
            &declareStochasticReporter<std::vector<Real>>(model_names[i] + "_std", _sampler);
    }
    else
      paramError("response_type", "Unknown response type ", _response_types[i]);
  }
}

void
EvaluateSurrogate::execute()
{
  // Loop over samples
  for (const auto ind : make_range(_sampler.getNumberOfLocalRows()))
  {
    const std::vector<Real> data = _sampler.getNextLocalRow();
    for (const auto m : index_range(_model))
    {
      if (_real_values[m] && _real_std[m])
        (*_real_values[m])[ind] = _model[m]->evaluate(data, (*_real_std[m])[ind]);
      else if (_real_values[m])
        (*_real_values[m])[ind] = _model[m]->evaluate(data);
      else if (_vector_real_values[m] && _vector_real_std[m])
        _model[m]->evaluate(data, (*_vector_real_values[m])[ind], (*_vector_real_std[m])[ind]);
      else if (_vector_real_values[m])
        _model[m]->evaluate(data, (*_vector_real_values[m])[ind]);
    }
  }
}

(contrib/moose/modules/stochastic_tools/src/surrogates/NearestPointSurrogate.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointSurrogate.h"

registerMooseObject("StochasticToolsApp", NearestPointSurrogate);

InputParameters
NearestPointSurrogate::validParams()
{
  InputParameters params = SurrogateModel::validParams();
  params.addClassDescription("Surrogate that evaluates the value from the nearest point from data "
                             "in [NearestPointTrainer.md]");
  return params;
}

NearestPointSurrogate::NearestPointSurrogate(const InputParameters & parameters)
  : SurrogateModel(parameters),
    _sample_points(getModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(getModelData<std::vector<std::vector<Real>>>("_sample_results"))
{
}

Real
NearestPointSurrogate::evaluate(const std::vector<Real> & x) const
{
  // Check whether input point has same dimensionality as training data
  mooseAssert(_sample_points.size() == x.size(),
              "Input point does not match dimensionality of training data.");

  return _sample_results[0][findNearestPoint(x)];
}

void
NearestPointSurrogate::evaluate(const std::vector<Real> & x, std::vector<Real> & y) const
{
  mooseAssert(_sample_points.size() == x.size(),
              "Input point does not match dimensionality of training data.");

  y.assign(_sample_results.size(), 0.0);

  unsigned int idx = findNearestPoint(x);

  for (const auto & r : index_range(y))
    y[r] = _sample_results[r][idx];
}

unsigned int
NearestPointSurrogate::findNearestPoint(const std::vector<Real> & x) const
{
  unsigned int idx = 0;

  // Container of current minimum distance during training sample loop
  Real dist_min = std::numeric_limits<Real>::max();

  for (dof_id_type p = 0; p < _sample_points[0].size(); ++p)
  {
    // Sum over the distance of each point dimension
    Real dist = 0;
    for (unsigned int i = 0; i < x.size(); ++i)
    {
      Real diff = (x[i] - _sample_points[i][p]);
      dist += diff * diff;
    }

    // Check if this training point distance is smaller than the current minimum
    if (dist < dist_min)
    {
      idx = p;
      dist_min = dist;
    }
  }
  return idx;
}

(contrib/moose/modules/stochastic_tools/src/surrogates/NearestPointSurrogate.C)

// This file is part of the MOOSE framework
// https://www.mooseframework.org
//
// All rights reserved, see COPYRIGHT for full restrictions
// https://github.com/idaholab/moose/blob/master/COPYRIGHT
//
// Licensed under LGPL 2.1, please see LICENSE for details
// https://www.gnu.org/licenses/lgpl-2.1.html

#include "NearestPointSurrogate.h"

registerMooseObject("StochasticToolsApp", NearestPointSurrogate);

InputParameters
NearestPointSurrogate::validParams()
{
  InputParameters params = SurrogateModel::validParams();
  params.addClassDescription("Surrogate that evaluates the value from the nearest point from data "
                             "in [NearestPointTrainer.md]");
  return params;
}

NearestPointSurrogate::NearestPointSurrogate(const InputParameters & parameters)
  : SurrogateModel(parameters),
    _sample_points(getModelData<std::vector<std::vector<Real>>>("_sample_points")),
    _sample_results(getModelData<std::vector<std::vector<Real>>>("_sample_results"))
{
}

Real
NearestPointSurrogate::evaluate(const std::vector<Real> & x) const
{
  // Check whether input point has same dimensionality as training data
  mooseAssert(_sample_points.size() == x.size(),
              "Input point does not match dimensionality of training data.");

  return _sample_results[0][findNearestPoint(x)];
}

void
NearestPointSurrogate::evaluate(const std::vector<Real> & x, std::vector<Real> & y) const
{
  mooseAssert(_sample_points.size() == x.size(),
              "Input point does not match dimensionality of training data.");

  y.assign(_sample_results.size(), 0.0);

  unsigned int idx = findNearestPoint(x);

  for (const auto & r : index_range(y))
    y[r] = _sample_results[r][idx];
}

unsigned int
NearestPointSurrogate::findNearestPoint(const std::vector<Real> & x) const
{
  unsigned int idx = 0;

  // Container of current minimum distance during training sample loop
  Real dist_min = std::numeric_limits<Real>::max();

  for (dof_id_type p = 0; p < _sample_points[0].size(); ++p)
  {
    // Sum over the distance of each point dimension
    Real dist = 0;
    for (unsigned int i = 0; i < x.size(); ++i)
    {
      Real diff = (x[i] - _sample_points[i][p]);
      dist += diff * diff;
    }

    // Check if this training point distance is smaller than the current minimum
    if (dist < dist_min)
    {
      idx = p;
      dist_min = dist;
    }
  }
  return idx;
}

Overview
Creating a Trainer
Creating a Surrogate