Fit empirical distribution to posterior samples for generating more waves
Source:R/abc-workflow.R
posterior_fit_empirical.RdThis function allows "updating" of the prior with an empirical posterior distribution which will retain the bounds of the prior, and is effectively a spline based transform of the prior distribution, based on the density of data in prior space. This gives a clean density when data is close to a prior distribution limit and work better than a standard density.
Arguments
- posteriors_df
a dataframe of posteriors that have been selected by ABC this may include columns for scores, weight and/or simulation outputs (
abc_component_score,abc_summary_distance,abc_weight,abc_simulation) as well as columns matching thepriorsinput specification.- priors_list
a named list of priors specified as a
abc_priorS3 object (seepriors()), this can include derived values as unnamed 2-sided formulae, where the LHS of the formula will be assigned to the value of the RHS, plus optionally a set of constraints as one sided formulae where the RHS of the formulae will resolve to a boolean value.- knots
the number of knots to model the CDF with. Optional, and will be typically inferred from the data size. Small numbers tend to work better if we expect the distribution to be unimodal.
- bw
for Adaptive ABC data distributions are smoothed before modelling empirical CDF. Over smoothing can reduce convergence rate, under-smoothing may result in noisy posterior estimates, and appearance of local modes.
- widen_by
change the dispersion of the empirical proposal distribution in ABC adaptive, preserving the median. This is akin to a nonlinear, heteroscedastic random walk in the quantile space, and can help address over-fitting or local modes in the ABC adaptive waves.
widen_byis an odds ratio and describes how much further from the median any given part of the distribution is after transformation. E.g. if the median of a distribution is zero, and thewiden_byis 2 then the 0.75 quantile will move to the position of the 0.9 quantile. The distribution will stay within the support of the prior. This is by default 1.05 which allows for some additional variability in proposals.
Value
an abc_prior S3 object approximating the distribution of the
posterior samples, and adhering to the support of the provided priors.
Details
This takes weighted posterior samples \(\{(\theta^{(i)}, w^{(i)})\}\) for each parameter \(\theta_j\) and constructs an empirical distribution function \(Q_j(\theta_j)\) that approximates the posterior marginal for that parameter.
For each parameter \(\theta_j\), the empirical distribution \(Q_j\) is
fitted using the empirical() function. The fitting is performed on the
weighted samples \((\theta_{j}^{(i)}, w^{(i)})\). A key feature is the use
of a link function \(h_j(\cdot)\) during the fitting process. This link
function is typically taken from the original prior distribution \(P_j(\theta_j)\)
(i.e., \(h_j = P_j\)). This ensures that the fitted empirical distribution
\(Q_j\) respects the support constraints defined by the prior (e.g., positive
values for a log-normal prior). The fitting effectively occurs in the transformed
space defined by the link: \(h_j(\theta_j^{(i)})\).
The resulting empirical distribution \(Q_j\) is a dist_fns object that
includes the support constraints from the original prior. The set of all fitted
marginal distributions \(\{Q_j\}\) forms the new proposal list for the next
wave.
Additionally, the weighted covariance matrix \(R\) of the samples in the
MVN space (defined by the prior copula) is calculated:
$$
R = \text{Cov}_{w}(\Phi^{-1}(P_1(\theta_1)), \dots, \Phi^{-1}(P_d(\theta_d)))
$$
where \(\Phi^{-1}\) is the quantile function of the standard normal.
This covariance matrix is stored as an attribute ("cor") of the returned
proposal list and is used to induce correlation structure when sampling
new proposals from the empirical distributions in subsequent waves.
Examples
fit = example_smc_fit()
proposals = posterior_fit_empirical(fit$posteriors, fit$priors)
proposals
#> Parameters:
#> * mean: posterior [4.97 ± 0.03]
#> * sd1: posterior [1.99 ± 0.07]
#> * sd2: posterior [0.99 ± 0.04]
#> Constraints:
#> * mean > sd2