- distributionsThe distribution names to be sampled, the number of distributions provided defines the number of columns per matrix.
C++ Type:std::vector<DistributionName>
Unit:(no unit assumed)
Controllable:No
Description:The distribution names to be sampled, the number of distributions provided defines the number of columns per matrix.
- inputs_reporterReporter with input parameters.
C++ Type:ReporterName
Unit:(no unit assumed)
Controllable:No
Description:Reporter with input parameters.
- num_samplessubNumber of samples per subset
C++ Type:unsigned int
Unit:(no unit assumed)
Controllable:No
Description:Number of samples per subset
- num_subsetsNumber of desired subsets
C++ Type:unsigned int
Unit:(no unit assumed)
Controllable:No
Description:Number of desired subsets
- output_reporterReporter with results of samples created by the SubApp.
C++ Type:ReporterName
Unit:(no unit assumed)
Controllable:No
Description:Reporter with results of samples created by the SubApp.
Parallel Subset Simulation (PSS)
Parallel Subset Simulation sampler.
Description
PSS is used for efficiently estimating small failure probabilities and for conducting optimization under uncertainties when dealing with computationally expensive numerical models. It can result in 2-3 orders of magnitude fewer calls to the numerical model when estimating failure probabilities or conducting optimization under uncertainty compared to standard Monte Carlo or Latin Hypercube Sampling. PSS works by creating intermediate failure thresholds to efficiently transition from sampling from the nominal input space to sampling from input spaces which optimize the numerical model output. Use of Markov Chain Monte Carlo (MCMC) is key to the PSS algorithm. In fact, PSS uses hundreds of Markov Chains to efficiently propagate to the regions of the input space that are most important with respect to optimizing the model output. Since these Markov chains are independent of each other, they can be run in parallel on a different set of processors. It should be noted that parallelization can only be achieved across Markov chains and not within a chain. That is, a given set of processors should completely run an entire chain composed of a certain number of samples (this number is usually 10). Coupled with several Markov chains and massively parallel computing, PSS may require only 1E1 to 1E2 order of magnitude numerical model evaluations for accurately estimating low failure probabilities or for conducting optimization studies.
Brief algorithmic details
Exhaustive algorithmic details of PSS have been presented in Au and Beck (2001). Only a brief description of the algorithm is presented here. To efficiently sample from regions of input parameter spaces that are crucial for optimization and failure evaluation of the model output, PSS creates intermediate failure thresholds defined through the equation:
(1)
where, is the failure probability of interest, and are the intermediate failure probabilities defining the intermediate failure thresholds, and is the number of subsets. Through the intermediate failure probabilities and PSS creates intermediate failure thresholds that allow to efficiently transition to sampling from input spaces which cause numerical model failure or numerical model optimal output. These intermediate failure thresholds are created with the aid of numerous Markov chains.
In practice, SS is implemented in the following manner. If is the total number of numerical model evaluations, each subset will have samples. An MCS is first used to generate samples. If the intermediate failure probabilities (except ) are all fixed to 0.1, then the first intermediate failure threshold is estimated as the percentile value of all the numerical model outputs. The outputs that do not exceed constitute Subset 1. To determine the next failure threshold , conditional samples should be generated such that the numerical model outputs always exceed . An MCMC method—in particular, a component-wise Metropolis method—is used to estimate by simulating numerous Markov chains. From the MCS samples in the first subset, those that exceeded the threshold are used as seeds (or starting values) for these Markov chains. If samples need to be simulated such that the outputs exceed , there will be Markov chains, with each chain simulating samples. In general, if the intermediate failure probabilities (except ) are fixed to instead of 0.1, there will be Markov chains, with each chain simulating samples. Once samples are generated from Markov chains, the second intermediate failure threshold is the percentile value of all the samples' outputs. Samples between and comprise the second subset. A similar procedure of simulating Markov chains is repeated for determining the subsequent failure thresholds until the final required failure threshold is reached. More details on the practical implementation of the PSS method are presented in Li and Cao (2016). Figure 1 presents a schematic of the PSS method.
Figure 1: Schematic of the PSS method
Parallelization of Parallel Subset Simulation: An example
Since PSS relies of numerous Markov chains, it can be parallelized. Each Markov chain can be run independently on separate sets of processors in parallel to other Markov chains. For optimal PSS performance, it is recommended that the number of samples evaluated by each processor is a multiple of 10. For example, consider that we are interested in evaluating the small failure probability (of the order ) of a numerical model. For this, we select 4 subsets with 10,000 samples per subset. In total, there will be 40,000 numerical model evaluations. If we use 1000 processors with as the intermediate failure probability, there will be 1000 Markov chains per subset with each chain making 10 numerical model evaluations. Therefore each processor evaluates the numerical model 10 times per subset. With 4 subsets, each processor evaluates the numerical model 40 times. In contrast, if we use only 100 processors, each processor solves the numerical model 100 times per subset. With 4 subsets, each processor solves the numerical model 400 times in total.
Alternatively, if we use 2000 processors, a unique set of two processors jointly solves the numerical model 10 times per subset and 40 times overall. Beyond 1000 processors, the number of numerical model solves per processor cannot be reduced further. This is due to the fact that parallelization can only be achieved across Markov chains and not within a chain. Each processor should solve the numerical model a minimum of 10 times per subset in this example because the number of samples in a Markov chain are with the intermediate failure probability fixed to 0.1.
In MOOSE, currently, Markov chain proposals are made by transforming all the inputs into a standard Normal space and then adding a Gaussian noise to a previously accepted sample with a unit standard deviation. Expanding this proposal scheme to incorporate other MCMC algorithms is possible and would be of interest in the future.
Interaction with the AdaptiveMonteCarloDecision
class
The PSS algorithm is an adaptive algorithm and it has a proposal step for all the Markov chains and a decision-making step on whether or not to accept the proposed samples. While the ParallelSubsetSimulation
class proposes new samples across multiple Markov chains, the AdaptiveMonteCarloDecision
class is used for decision-making. Please refer to AdaptiveMonteCarloDecision for more details.
If the parameter "use_absolute_value" is True
, the absolute value of "output_reporter" will be maximized. As such, a least-squares fit optimization can be achieved by passing the negative of the sum-squared error into the value of "output_reporter" with "use_absolute_value" set to True
. Otherwise the value of "output_reporter" will be maximized after its absolute value is taken.
Example Input Syntax
The input file for using the PSS algorithm is somewhat similar to the other sampler classes except for three differences. First, the Samplers
block is presented below:
where, "num_samplessub" is the number of samples per subset and "num_subsets" is the number of subsets. "inputs_reporter" and "output_reporter" are the reporter values which transfer information between the ParallelSubsetSimulation
sampler and the AdaptiveMonteCarloDecision
reporter. There is an optional input parameter "subset_probability" which has been defaulted to 0.1
, meaning that there are samples per Markov chain. This can, however, be changed as per the user preference. "num_parallel_chains" is also an optional parameter that explicitly specifies the number of Markov chains to be run in parallel per subset. If "num_parallel_chains" is not specified, the number of parallel Markov chains per subset will be equal to the number of processors. Besides, "use_absolute_value" can be set to true when failure is defined as a non-exceedance rather than an exceedance.
Second, the Reporters
block is presented below with the AdaptiveMonteCarloDecision
reporter:
where, the output and input reporters are both initialized.
Third, the Executioner
block is presented below:
where it is noticed that unlike some other sampler classes, the type
is transient. The number of time steps is automatically determined based on "num_samplessub", "num_subsets" and the number of processors used.
Output format
The ParallelSubsetSimulation
sampler can output a csv or a json file. However, a json output is a preferred output format for post-processing sampler data from adaptive Monte Carlo algorithms. The json file consists of "inputs" and "output_required" corresponding to each "time_step" across all the processors.
Input Parameters
- execute_onLINEARThe list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.
Default:LINEAR
C++ Type:ExecFlagEnum
Unit:(no unit assumed)
Controllable:No
Description:The list of flag(s) indicating when this object should be executed. For a description of each flag, see https://mooseframework.inl.gov/source/interfaces/SetupInterface.html.
- limit_get_global_samples429496729The maximum allowed number of items in the DenseMatrix returned by getGlobalSamples method.
Default:429496729
C++ Type:unsigned long
Unit:(no unit assumed)
Controllable:No
Description:The maximum allowed number of items in the DenseMatrix returned by getGlobalSamples method.
- limit_get_local_samples429496729The maximum allowed number of items in the DenseMatrix returned by getLocalSamples method.
Default:429496729
C++ Type:unsigned long
Unit:(no unit assumed)
Controllable:No
Description:The maximum allowed number of items in the DenseMatrix returned by getLocalSamples method.
- limit_get_next_local_row429496729The maximum allowed number of items in the std::vector returned by getNextLocalRow method.
Default:429496729
C++ Type:unsigned long
Unit:(no unit assumed)
Controllable:No
Description:The maximum allowed number of items in the std::vector returned by getNextLocalRow method.
- max_procs_per_row4294967295This will ensure that the sampler is partitioned properly when 'MultiApp/*/max_procs_per_app' is specified. It is not recommended to use otherwise.
Default:4294967295
C++ Type:unsigned int
Unit:(no unit assumed)
Controllable:No
Description:This will ensure that the sampler is partitioned properly when 'MultiApp/*/max_procs_per_app' is specified. It is not recommended to use otherwise.
- min_procs_per_row1This will ensure that the sampler is partitioned properly when 'MultiApp/*/min_procs_per_app' is specified. It is not recommended to use otherwise.
Default:1
C++ Type:unsigned int
Unit:(no unit assumed)
Controllable:No
Description:This will ensure that the sampler is partitioned properly when 'MultiApp/*/min_procs_per_app' is specified. It is not recommended to use otherwise.
- num_parallel_chainsNumber of Markov chains to run in parallel, default is based on the number of processors used.
C++ Type:unsigned int
Unit:(no unit assumed)
Controllable:No
Description:Number of Markov chains to run in parallel, default is based on the number of processors used.
- num_random_seeds100000Initialize a certain number of random seeds. Change from the default only if you have to.
Default:100000
C++ Type:unsigned int
Unit:(no unit assumed)
Controllable:No
Description:Initialize a certain number of random seeds. Change from the default only if you have to.
- seed0Random number generator initial seed
Default:0
C++ Type:unsigned int
Unit:(no unit assumed)
Controllable:No
Description:Random number generator initial seed
- subset_probability0.1Conditional probability of each subset
Default:0.1
C++ Type:double
Unit:(no unit assumed)
Controllable:No
Description:Conditional probability of each subset
- use_absolute_valueFalseUse absolute value of the sub app output
Default:False
C++ Type:bool
Unit:(no unit assumed)
Controllable:No
Description:Use absolute value of the sub app output
Optional Parameters
- control_tagsAdds user-defined labels for accessing object parameters via control logic.
C++ Type:std::vector<std::string>
Unit:(no unit assumed)
Controllable:No
Description:Adds user-defined labels for accessing object parameters via control logic.
- enableTrueSet the enabled status of the MooseObject.
Default:True
C++ Type:bool
Unit:(no unit assumed)
Controllable:No
Description:Set the enabled status of the MooseObject.
Advanced Parameters
References
- S. K. Au and J. L. Beck.
Estimation of small failure probabilities in high dimensions by subset simulation.
Probabilistic engineering mechanics, 16(4):263–277, 2001.
URL: https://doi.org/10.1016/S0266-8920(01)00019-4.[BibTeX]
- H. S. Li and Z. J. Cao.
Matlab codes of subset simulation for reliability analysis and structural optimization.
Structural and Multidisciplinary Optimization, 54(2):391–410, 2016.
URL: https://doi.org/10.1007/s00158-016-1414-5.[BibTeX]