This performs a range of continuous scoring metrics for each estimate time-point using cumulative distribution functions for each estimate. Point quality metrics are calculated for each estimate provided and summarised. Summarisation is performed using bootstrap resampling to generate confidence intervals for summary statistics, which are presented as median +/1 95% CI.
Arguments
- est
a dataframe of estimates of incidence, growth rate of reproduction number based off a simulation or data with known parameters. Each group in
estis expected to contain multiple estimates and each group is scored separately. Estimates inestmust be in the form of a column namedXXX.cdfcontaining a cumulative distribution function for the estimate andXXX.linkcontaining a link function specification (one ifidentity,logorlogit). These are not generated by default by theggoutbreakestimators but are triggered by setting the option:options("ggoutbreak.keep_cdf"=TRUE)before running the estimator. The CDFs generated will be analytical, if the estimator generates a parametrised output (or a mixture thereof), empirical if the estimator uses resampling, or inferred if the estimator produces quantiles only.- obs
a dataframe of the ground truth, sharing the same grouping and columns as
estwith at least one column(s) namedXXX.obswithXXXbeing e.g.rt,growthorincidenceor any other column group predicted inest(i.e. ifobshas a columnXXX.obs,estmust have one calledXXX.cdf).- lags
a data frame of estimate types and lags as output by
quantify_lag()if multiple models are included then the columns must match those inobs. It must have 2 columns, one calledestimatewith values matchingincidence,rt,growth,proportion,relative.growth, and alagcolumn, with (whole) number of days.- summarise_by
by default every group is treated separately. This can be overridden with a
dplyrspecification of the groupings we want to see in the final summarised output (e.g. if we want to differentiate performance on a particular type of scenario or timeframe). If this is exactlyFALSEthe function will return all the raw point estimates.- bootstraps
the number of bootstrap replicates to draw for assessing metric confidence. If FALSE then no bootstrapping will be done and the metrics returned will have no confidence intervals.
- raw_bootstraps
(defaults to FALSE) return the summary metrics for each bootstrap rather than the quantiles of the summary metrics.
- seed
a random seed for reproducibility
Value
a dataframe of scoring metrics, with one row per group. This includes the following columns:
mean_quantile_bias- the average of the universal residuals. Lower values are better.mean_trans_bias- the bias on the link function scale.link - the link function
mean_bias- the bias on the natural scale (which may be interpreted as additive or multiplicative depending on the link)pit_was- an unadjusted probability integral transform histogram Wasserstein distance from the uniform (lower values are better).unbiased_pit_was- an PIT Wasserstein distance from the uniform, adjusted for estimator bias (lower values are better). This is a measure of calibration.directed_pit_was- a PIT Wasserstein distance from the uniform, directed away from the centre, adjusted for estimator bias (values closer to zero are better, positive values indicate overconfidence, and negative values excessively conservative estimates).percent_iqr_coverage- the percentage of estimators that include the true value in their IQR. For a perfectly calibrated estimate this should be 0.5. Lower values reflect overconfidence, higher values reflect excessively conservative estimates. This is a measure of calibration but is influenced by bias.unbiased_percent_iqr_coverage- the percentage of estimators that include the true value in their IQR once adjusted for bias. This should be 0.5. This is a measure of calibration, and tells you which direction (smaller numbers are over-confident, larger values excessively conservative).mean_prediction_interval_width_50- the prediction interval width is a measure of sharpness (smaller values are sharper). Sharper estimators are superior if they are unbiased and well calibrated.mean_crps- the mean value of the continuous rank probability score for each point estimate (lower values are better)mean_unbiased_crps- the mean value of the continuous rank probability score for each point estimate assessed after adjustment for bias (lower values are better)threshold_misclassification_probability- if a metric has a natural threshold like 1 for Rt then this measures how probable it is that the estimate will propose the epidemic is shrinking when it is growing and vice versa. Lower is better
other outputs are possible if summarise_by is false.
Examples
data = test_poisson_rt_smooth
pipeline = ~ .x %>% poisson_locfit_model(ip = .y, quick=TRUE)
lags = quantify_lag(pipeline, ip = test_ip)
#> Rt estimation using Locfit (approx and assuming independence)
withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
est = data %>% poisson_locfit_model(ip = test_ip, quick=TRUE)
})
#> Rt estimation using Locfit (approx and assuming independence)
#> Warning: Estimates were assumed to be independent, but more that 1% of estimates
#> are at risk of Rt underestimation by more that 0.05 (absolute).
#> We advise re-running supplying a full variance-covariance matrix, or
#> a value to the `raw` parameter, or setting `quick=FALSE`.
if (interactive()) plot_rt(est)+sim_geom_function(data, colour="red")
obs = data %>% dplyr::mutate(rt.obs = rt, incidence.obs = rate)
score_estimate(est,obs,lags) %>% dplyr::glimpse()
#> estimates match true observations using columns: statistic,time
#> matching ground truth observations for: rt,incidence
#> Rows: 2
#> Columns: 58
#> Groups: link, statistic, .type [2]
#> $ link <chr> "log", "log"
#> $ statistic <chr> "infections", "infection…
#> $ .type <chr> "incidence", "rt"
#> $ mean_crps.0.025 <dbl> 1.94219662, 0.03310769
#> $ mean_crps.0.25 <dbl> 2.17239004, 0.03752556
#> $ mean_crps.0.5 <dbl> 2.32341761, 0.03970468
#> $ mean_crps.0.75 <dbl> 2.44710580, 0.04203138
#> $ mean_crps.0.975 <dbl> 2.72600459, 0.04665231
#> $ threshold_misclassification_probability.0.025 <dbl> NA, 0.005194496
#> $ threshold_misclassification_probability.0.25 <dbl> NA, 0.006976495
#> $ threshold_misclassification_probability.0.5 <dbl> NA, 0.008053554
#> $ threshold_misclassification_probability.0.75 <dbl> NA, 0.009258347
#> $ threshold_misclassification_probability.0.975 <dbl> NA, 0.01160723
#> $ mean_trans_bias.0.025 <dbl> -0.01102948, -0.05274525
#> $ mean_trans_bias.0.25 <dbl> 0.009583191, -0.038174945
#> $ mean_trans_bias.0.5 <dbl> 0.02470395, -0.03219066
#> $ mean_trans_bias.0.75 <dbl> 0.04446509, -0.02586515
#> $ mean_trans_bias.0.975 <dbl> 0.08772715, -0.01572738
#> $ mean_bias.0.025 <dbl> 0.9969904, 0.9578061
#> $ mean_bias.0.25 <dbl> 1.0275367, 0.9690985
#> $ mean_bias.0.5 <dbl> 1.2345378, 0.9739418
#> $ mean_bias.0.75 <dbl> 1.4467341, 0.9790137
#> $ mean_bias.0.975 <dbl> 1.733837, 0.987685
#> $ mean_quantile_bias.0.025 <dbl> 0.006081356, -0.126377752
#> $ mean_quantile_bias.0.25 <dbl> 0.05655466, -0.08321926
#> $ mean_quantile_bias.0.5 <dbl> 0.08848174, -0.05922198
#> $ mean_quantile_bias.0.75 <dbl> 0.11650725, -0.03712558
#> $ mean_quantile_bias.0.975 <dbl> 0.174005212, 0.002649273
#> $ mean_prediction_interval_width_50.0.025 <dbl> 4.8967019, 0.1224512
#> $ mean_prediction_interval_width_50.0.25 <dbl> 5.3713646, 0.1341302
#> $ mean_prediction_interval_width_50.0.5 <dbl> 5.646177, 0.140501
#> $ mean_prediction_interval_width_50.0.75 <dbl> 5.9245636, 0.1460532
#> $ mean_prediction_interval_width_50.0.975 <dbl> 6.4682256, 0.1577648
#> $ pit_was.0.025 <dbl> 0.03785947, 0.07260691
#> $ pit_was.0.25 <dbl> 0.05070468, 0.08706268
#> $ pit_was.0.5 <dbl> 0.05753087, 0.09475422
#> $ pit_was.0.75 <dbl> 0.06561711, 0.10252945
#> $ pit_was.0.975 <dbl> 0.08753398, 0.11518441
#> $ unbiased_pit_was.0.025 <dbl> 0.04882549, 0.06913734
#> $ unbiased_pit_was.0.25 <dbl> 0.06318438, 0.08340105
#> $ unbiased_pit_was.0.5 <dbl> 0.07145733, 0.09122696
#> $ unbiased_pit_was.0.75 <dbl> 0.08207681, 0.09897618
#> $ unbiased_pit_was.0.975 <dbl> 0.1086903, 0.1118732
#> $ directed_pit_was.0.025 <dbl> -0.05353149, -0.10850851
#> $ directed_pit_was.0.25 <dbl> -0.03731746, -0.09597521
#> $ directed_pit_was.0.5 <dbl> -0.02899301, -0.08819360
#> $ directed_pit_was.0.75 <dbl> -0.01971303, -0.08087673
#> $ directed_pit_was.0.975 <dbl> -0.002143565, -0.0664757…
#> $ percent_iqr_coverage.0.025 <dbl> 0.484472, 0.656250
#> $ percent_iqr_coverage.0.25 <dbl> 0.5403727, 0.7062500
#> $ percent_iqr_coverage.0.5 <dbl> 0.5652174, 0.7312500
#> $ percent_iqr_coverage.0.75 <dbl> 0.5900621, 0.7562500
#> $ percent_iqr_coverage.0.975 <dbl> 0.6399068, 0.8000000
#> $ unbiased_percent_iqr_coverage.0.025 <dbl> 0.5217391, 0.6562500
#> $ unbiased_percent_iqr_coverage.0.25 <dbl> 0.5714286, 0.7062500
#> $ unbiased_percent_iqr_coverage.0.5 <dbl> 0.5962733, 0.7312500
#> $ unbiased_percent_iqr_coverage.0.75 <dbl> 0.621118, 0.756250
#> $ unbiased_percent_iqr_coverage.0.975 <dbl> 0.6708075, 0.8000000
