Skip to contents

Delayed GAM reporting model function generator

Usage

gam_delayed_reporting(
  window,
  max_delay = 40,
  ...,
  knots_fn = ~gam_knots(.x, window, ...)
)

Arguments

window

controls the knot spacing in the GAM (if the default)

max_delay

the maximum delay we expect to model

...

Named arguments passed on to gam_knots

data

the function will be called with incidence data - a dataframe with columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a `time_period`

Any grouping allowed.

k

alternative to window, if k is given then the behaviour of the knots will be similar to the default mgcv::s(...,k=...) parameter.

...

currently not used

knots_fn

a function that takes the data as an input and returns a set of integers as time points for GAM knots, for s(time) term. The default here provides a roughly equally spaced grid determined by window, by a user supplied function could do anything. The input this function is the raw dataframe of data that will be considered for one model fit. It is guaranteed to have at least a time and count column. It is possible to

Value

a list with 2 entries - model_fn and predict suitable as the input for poisson_gam_model(model_fn = ..., predict=...).

Details

This function is used to configure a delayed reporting GAM model. The model is of the form:

count ~ s(time, bs = "cr", k = length(kts)) + s(log(tau), k = 4, pc = max_delay)

where tau is the difference between time series observation time and the time of the data point in the time series, and we have multiple observations of the same time series. This function helps specify the knots of the GAM and the maximum expected delay

Examples


data = test_delayed_observation %>% dplyr::group_by(obs_time)
cfg = gam_delayed_reporting(14,40)
fit = cfg$model_fn(data)
summary(fit)
#> 
#> Family: Negative Binomial(331.222) 
#> Link function: log 
#> 
#> Formula:
#> count ~ s(time, bs = "cr", k = length(kts)) + s(log(tau), k = 4, 
#>     pc = max_delay)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  2.76943    0.01968   140.7   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Approximate significance of smooth terms:
#>              edf Ref.df Chi.sq p-value    
#> s(time)     6.99      7 222105  <2e-16 ***
#> s(log(tau)) 3.00      3  31119  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> R-sq.(adj) =  0.993   Deviance explained = 99.7%
#> -REML = 8726.6  Scale est. = 1         n = 3240