Skip to contents

This performs a range of continuous scoring metrics for each estimate time-point using cumulative distribution functions for each estimate. Point quality metrics are calculated for each estimate provided and summarised. Summarisation is performed using bootstrap resampling to generate confidence intervals for summary statistics, which are presented as median +/1 95% CI.

Usage

score_estimate(
  est,
  obs,
  lags = NULL,
  summarise_by = est %>% dplyr::groups(),
  bootstraps = 1000,
  raw_bootstraps = FALSE,
  seed = 100
)

Arguments

est

a dataframe of estimates of incidence, growth rate of reproduction number based off a simulation or data with known parameters. Each group in est is expected to contain multiple estimates and each group is scored separately. Estimates in est must be in the form of a column named XXX.cdf containing a cumulative distribution function for the estimate and XXX.link containing a link function specification (one if identity,log or logit). These are not generated by default by the ggoutbreak estimators but are triggered by setting the option: options("ggoutbreak.keep_cdf"=TRUE) before running the estimator. The CDFs generated will be analytical, if the estimator generates a parametrised output (or a mixture thereof), empirical if the estimator uses resampling, or inferred if the estimator produces quantiles only.

obs

a dataframe of the ground truth, sharing the same grouping and columns as est with at least one column(s) named XXX.obs with XXX being e.g. rt,growth or incidence or any other column group predicted in est (i.e. if obs has a column XXX.obs, est must have one called XXX.cdf).

lags

a data frame of estimate types and lags as output by quantify_lag() if multiple models are included then the columns must match those in obs. It must have 2 columns, one called estimate with values matching incidence,rt,growth,proportion,relative.growth, and a lag column, with (whole) number of days.

summarise_by

by default every group is treated separately. This can be overridden with a dplyr specification of the groupings we want to see in the final summarised output (e.g. if we want to differentiate performance on a particular type of scenario or timeframe). If this is exactly FALSE the function will return all the raw point estimates.

bootstraps

the number of bootstrap replicates to draw for assessing metric confidence. If FALSE then no bootstrapping will be done and the metrics returned will have no confidence intervals.

raw_bootstraps

(defaults to FALSE) return the summary metrics for each bootstrap rather than the quantiles of the summary metrics.

seed

a random seed for reproducibility

Value

a dataframe of scoring metrics, with one row per group. This includes the following columns:

  • mean_quantile_bias - the average of the universal residuals. Lower values are better.

  • mean_trans_bias - the bias on the link function scale.

  • link - the link function

  • mean_bias - the bias on the natural scale (which may be interpreted as additive or multiplicative depending on the link)

  • pit_was - an unadjusted probability integral transform histogram Wasserstein distance from the uniform (lower values are better).

  • unbiased_pit_was - an PIT Wasserstein distance from the uniform, adjusted for estimator bias (lower values are better). This is a measure of calibration.

  • directed_pit_was - a PIT Wasserstein distance from the uniform, directed away from the centre, adjusted for estimator bias (values closer to zero are better, positive values indicate overconfidence, and negative values excessively conservative estimates).

  • percent_iqr_coverage - the percentage of estimators that include the true value in their IQR. For a perfectly calibrated estimate this should be 0.5. Lower values reflect overconfidence, higher values reflect excessively conservative estimates. This is a measure of calibration but is influenced by bias.

  • unbiased_percent_iqr_coverage - the percentage of estimators that include the true value in their IQR once adjusted for bias. This should be 0.5. This is a measure of calibration, and tells you which direction (smaller numbers are over-confident, larger values excessively conservative).

  • mean_prediction_interval_width_50 - the prediction interval width is a measure of sharpness (smaller values are sharper). Sharper estimators are superior if they are unbiased and well calibrated.

  • mean_crps - the mean value of the continuous rank probability score for each point estimate (lower values are better)

  • mean_unbiased_crps - the mean value of the continuous rank probability score for each point estimate assessed after adjustment for bias (lower values are better)

  • threshold_misclassification_probability - if a metric has a natural threshold like 1 for Rt then this measures how probable it is that the estimate will propose the epidemic is shrinking when it is growing and vice versa. Lower is better

other outputs are possible if summarise_by is false.

Examples

data = test_poisson_rt_smooth

pipeline = ~ .x %>% poisson_locfit_model(ip = .y)
lags = quantify_lag(pipeline, ip = test_ip)
#> Rt estimation using Locfit (exact with inferred covariance)

withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   est = data %>% poisson_locfit_model(ip = test_ip)
})
#> Rt estimation using Locfit (exact with inferred covariance)

if (interactive()) plot_rt(est)+sim_geom_function(data, colour="red")

obs = data %>% dplyr::mutate(rt.obs = rt, incidence.obs = rate)
score_estimate(est,obs,lags) %>% dplyr::glimpse()
#> estimates match true observations using columns: statistic,time
#> matching ground truth observations for: rt,incidence
#> Rows: 2
#> Columns: 58
#> Groups: link, statistic, .type [2]
#> $ link                                          <chr> "log", "log"
#> $ statistic                                     <chr> "infections", "infection…
#> $ .type                                         <chr> "incidence", "rt"
#> $ mean_crps.0.025                               <dbl> 1.94219662, 0.03309095
#> $ mean_crps.0.25                                <dbl> 2.17239004, 0.03743459
#> $ mean_crps.0.5                                 <dbl> 2.32341761, 0.03956198
#> $ mean_crps.0.75                                <dbl> 2.44710580, 0.04176993
#> $ mean_crps.0.975                               <dbl> 2.72600459, 0.04627801
#> $ threshold_misclassification_probability.0.025 <dbl> NA, 0.005489954
#> $ threshold_misclassification_probability.0.25  <dbl> NA, 0.00730308
#> $ threshold_misclassification_probability.0.5   <dbl> NA, 0.008418191
#> $ threshold_misclassification_probability.0.75  <dbl> NA, 0.009628172
#> $ threshold_misclassification_probability.0.975 <dbl> NA, 0.01204804
#> $ mean_trans_bias.0.025                         <dbl> -0.01102948, -0.05169073
#> $ mean_trans_bias.0.25                          <dbl> 0.009583191, -0.037295518
#> $ mean_trans_bias.0.5                           <dbl> 0.02470395, -0.03131416
#> $ mean_trans_bias.0.75                          <dbl> 0.04446509, -0.02505914
#> $ mean_trans_bias.0.975                         <dbl> 0.08772715, -0.01502605
#> $ mean_bias.0.025                               <dbl> 0.9969904, 0.9586349
#> $ mean_bias.0.25                                <dbl> 1.0275367, 0.9698996
#> $ mean_bias.0.5                                 <dbl> 1.2345378, 0.9746026
#> $ mean_bias.0.75                                <dbl> 1.4467341, 0.9797131
#> $ mean_bias.0.975                               <dbl> 1.7338368, 0.9884128
#> $ mean_quantile_bias.0.025                      <dbl> 0.006081356, -0.124148079
#> $ mean_quantile_bias.0.25                       <dbl> 0.05655466, -0.08135188
#> $ mean_quantile_bias.0.5                        <dbl> 0.08848174, -0.05787424
#> $ mean_quantile_bias.0.75                       <dbl> 0.11650725, -0.03577158
#> $ mean_quantile_bias.0.975                      <dbl> 0.174005212, 0.003068056
#> $ mean_prediction_interval_width_50.0.025       <dbl> 4.8967019, 0.1249395
#> $ mean_prediction_interval_width_50.0.25        <dbl> 5.3713646, 0.1367294
#> $ mean_prediction_interval_width_50.0.5         <dbl> 5.6461767, 0.1431578
#> $ mean_prediction_interval_width_50.0.75        <dbl> 5.9245636, 0.1487643
#> $ mean_prediction_interval_width_50.0.975       <dbl> 6.4682256, 0.1609658
#> $ pit_was.0.025                                 <dbl> 0.03785947, 0.07516036
#> $ pit_was.0.25                                  <dbl> 0.05070468, 0.08939218
#> $ pit_was.0.5                                   <dbl> 0.05753087, 0.09691206
#> $ pit_was.0.75                                  <dbl> 0.06561711, 0.10484936
#> $ pit_was.0.975                                 <dbl> 0.08753398, 0.11723792
#> $ unbiased_pit_was.0.025                        <dbl> 0.04669956, 0.07233882
#> $ unbiased_pit_was.0.25                         <dbl> 0.06033831, 0.08644211
#> $ unbiased_pit_was.0.5                          <dbl> 0.06824216, 0.09443587
#> $ unbiased_pit_was.0.75                         <dbl> 0.07842087, 0.10197989
#> $ unbiased_pit_was.0.975                        <dbl> 0.1041765, 0.1147653
#> $ directed_pit_was.0.025                        <dbl> -0.0527334, -0.1115264
#> $ directed_pit_was.0.25                         <dbl> -0.03618468, -0.09892036
#> $ directed_pit_was.0.5                          <dbl> -0.02811997, -0.09109671
#> $ directed_pit_was.0.75                         <dbl> -0.01863533, -0.08383854
#> $ directed_pit_was.0.975                        <dbl> -0.001337369, -0.0698936…
#> $ percent_iqr_coverage.0.025                    <dbl> 0.484472, 0.662500
#> $ percent_iqr_coverage.0.25                     <dbl> 0.5403727, 0.7125000
#> $ percent_iqr_coverage.0.5                      <dbl> 0.5652174, 0.7375000
#> $ percent_iqr_coverage.0.75                     <dbl> 0.5900621, 0.7625000
#> $ percent_iqr_coverage.0.975                    <dbl> 0.6399068, 0.8062500
#> $ unbiased_percent_iqr_coverage.0.025           <dbl> 0.5279503, 0.6750000
#> $ unbiased_percent_iqr_coverage.0.25            <dbl> 0.5776398, 0.7187500
#> $ unbiased_percent_iqr_coverage.0.5             <dbl> 0.6024845, 0.7437500
#> $ unbiased_percent_iqr_coverage.0.75            <dbl> 0.6273292, 0.7687500
#> $ unbiased_percent_iqr_coverage.0.975           <dbl> 0.6770186, 0.8064062