Incidence and growth rate from case positive counts
This method uses a case positive count dataset bundled with growth
rates which is age stratified (age grouping is in the class
column). We will look at the age stratification in different vignettes
but in this instance we want to aggregate it into an England wide rate.
This is the purpose of time_aggregate()
which performs a
simple summarisation.
england_covid %>% dplyr::glimpse()
#> Rows: 26,790
#> Columns: 5
#> Groups: class [19]
#> $ date <date> 2023-12-09, 2023-12-09, 2023-12-09, 2023-12-09, 2023-12-09, 202…
#> $ class <fct> 00_04, 05_09, 10_14, 15_19, 20_24, 25_29, 30_34, 35_39, 40_44, 4…
#> $ count <dbl> 24, 8, 8, 4, 21, 20, 29, 36, 41, 59, 53, 54, 56, 54, 67, 72, 56,…
#> $ denom <dbl> 771, 771, 771, 771, 771, 771, 771, 771, 771, 771, 771, 771, 771,…
#> $ time <time_prd> 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409, 1409,…
tmp = ggoutbreak::england_covid %>%
time_aggregate(count=sum(count))
The raw COVID-19 case count on a log1p
scale is the
total detected cases per day.
fit = tmp %>%
poisson_locfit_model()
plot_incidence(fit,raw = tmp, colour="blue",size=0.025)+
scale_y_log1p(n=7)
Major events in the timeseries can be plotted on the axes. We’ve focussed on the first 2 years of the pandemic:
plot_incidence(fit, raw = tmp,events = england_events, colour="blue",size=0.025)+
scale_y_log1p(n=7) + ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2022-01-01")))
The incidence model assumes case rates result from a Poisson process
and the rate is estimated with a time varying locally fitted polynomial
with degree defined by the deg
parameter, using a log link
function, according to the methods of Loader et al. (see
utils::citation("locfit")
). The fitting process is a local
maximum likelihood estimation and uses a bandwidth defined to account
for the data points within window
of the time point being
estimated.
The gradient of the fitted polynomial on the log scale, is the exponential growth rate. This is a scale independent view of the rate of growth of the epidemic. This estimation methodology is compared to consensus estimates from the SPI-M-O UK government advisory group in red, shifted forward in time by 21 days. The SPI-M-O estimates made during the pandemic were retrospective whereas these ones can use information from before and after the time point and now may better represent the timing of changes.
plot_growth_rate(fit,events = england_events, colour="blue")+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2022-01-01")), ylim=c(-0.15,0.15))+
ggplot2::geom_errorbar(data=england_consensus_growth_rate,ggplot2::aes(x=date-21,ymin=low,ymax=high),colour="red")
The state of the epidemic is described by both incidence and growth, and the phase plots allow us to see both at different time points. In this case the epidemic state in the 10 weeks leading up to Christmas in 2021, 2022 and 2023:
plot_growth_phase(fit,
timepoints = as.Date(c("Xmas 2020"="2020-12-25","Xmas 2021"="2021-12-25","Xmas 2022"="2022-12-25")),
duration = 70,
interval = 7,
colour="blue",
strip.position = "top"
)
Reproduction number estimation
The growth rate has a unit or “per day” in this example. From it we can derive the reproduction number, using the methods of Wallinga and Lipsitch and an estimate of the infectivity profile of COVID-19. This describes the probability that an infectee is infected x days after the infector, and includes a temporal dimension rendering the reproduction number a dimensionless quantity reflecting the average number of infectees resulting from each infector.
We also have methods to infer the reproduction unmber from modelled
incidence estimates where logarithmic link functions are used. This is
described in the vignette:
vignette("rt-from-incidence", package="ggoutbreak")
ggoutbreak
has an estimate of the infectivity profile
based on a meta-analysis of serial interval estimates of COVID-19. The
infectivity profile is a bootstrapped set of discrete probability
distributions. It is truncated at 14 days.
As with before effective estimates are compared to consensus values from the SPI-M-O group (red):
rt_fit = fit %>% ggoutbreak::rt_from_incidence(ip = covid_ip)
plot_rt(rt_fit, events = england_events, colour="blue")+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2022-01-01")), ylim=c(0.6,1.6))+
ggplot2::geom_errorbar(data=england_consensus_rt,ggplot2::aes(x=date-14,ymin=low,ymax=high),colour="red")
EpiEstim
fits for comparison for the same data, and the same infectivity profile
are much more certain and exhibit some oscillation due to the weekly
periodicity of the underlying time series. This oscillation is
accentuated by the truncated infectivity profile we are using here.
rt_epi_fit = tmp %>% ggoutbreak::rt_epiestim(ip = covid_ip,window = 14)
plot_rt(rt_epi_fit, events = england_events, colour="blue")+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2022-01-01")), ylim=c(0.6,1.6))+
ggplot2::geom_errorbar(data=england_consensus_rt,ggplot2::aes(x=date-7,ymin=low,ymax=high),colour="red")
Prevalence and growth rate from test positivity rates
Test availability was not consistent during the pandemic. In the
early stages PCR tests were difficult to obtain and case positive
incidence estimates are thought to be a vast underestimate. During
certain parts of the pandemic targeted testing of high risk groups
occurred. Test positivity is a different view on the pandemic and
accounts for some of these biases and introduces others of its own. For
this the data must contain a denom
column which in this
case represents the number of tests conducted:
england_covid_pcr_positivity %>% dplyr::glimpse()
#> Rows: 1,413
#> Columns: 4
#> $ date <date> 2023-12-12, 2023-12-11, 2023-12-10, 2023-12-09, 2023-12-08, 202…
#> $ time <time_prd> 1444, 1443, 1442, 1441, 1440, 1439, 1438, 1437, 1436, 1435,…
#> $ count <dbl> 375, 509, 381, 350, 445, 399, 430, 457, 413, 295, 252, 293, 343,…
#> $ denom <dbl> 1707, 5884, 5514, 6001, 7840, 8333, 8946, 10139, 9805, 6445, 638…
fit2 = england_covid_pcr_positivity %>%
ggoutbreak::proportion_locfit_model()
plot_proportion(fit2, england_covid_pcr_positivity, events = england_events, size=0.25, colour="blue")+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2022-01-01")))
In this case the gradient of the proportion on the logistic scale is an estimate of the growth rate. It is in some senses relative to the growth of the testing effort but in this case produces an answer very similar to the incidence model.
plot_growth_rate(fit2, events = england_events, colour="blue")+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2022-01-01")), ylim=c(-0.15,0.15))+
ggplot2::geom_errorbar(data=england_consensus_growth_rate,ggplot2::aes(x=date-21,ymin=low,ymax=high),colour="red")
With similar growth rate estimates this method can also theoretically be used to calculate estimates of . Growth-proportion phase diagrams can also compare different points in times or as we will see elsewhere, between different populations.
plot_growth_phase(fit2,
timepoints = as.Date(c("Xmas 2020"="2020-12-25","Xmas 2021"="2021-12-25","Xmas 2022"="2022-12-25")),
duration = 70,
interval = 7,
colour="blue"
)
NHS COVID app
The NHS COVID-19 app performed digital contact tracing. The rate of venue check-ins demonstrates the levels of high risk social contacts however it became optional from Aug 2021. Self isolation alerts peaked in Aug / Sept 2021 during the Delta wave and again in Dec 2021 / Jan 2022 in the Omicron wave. Periods of rapid growth precede increases in the NHS app notifications. (N.B. data from https://www.gov.uk/government/publications/nhs-covid-19-app-statistics)
p1 = plot_incidence(fit,events = england_events, colour="blue", date_breaks="3 months")+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2023-07-01")))+
ggplot2::facet_wrap(~"Cases")+
ggplot2::theme(axis.text.x.bottom = ggplot2::element_blank())+
scale_y_log1p()
p2 = plot_growth_rate(fit,events = england_events, colour="blue", date_breaks="3 months")+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2023-07-01")), ylim=c(-0.15,0.15))+
ggplot2::geom_errorbar(data=england_consensus_growth_rate,ggplot2::aes(x=date-21,ymin=low,ymax=high),colour="red")+
ggplot2::facet_wrap(~"Growth rate")+
ggplot2::theme(axis.text.x.bottom = ggplot2::element_blank(),axis.text.x.top = ggplot2::element_blank())
p3 = ggplot2::ggplot(ggoutbreak::england_nhs_app)+
geom_events(events=england_events,hide_labels = TRUE)+
ggplot2::geom_step(ggplot2::aes(x=date, y=alerts/mean(alerts, na.rm=TRUE),colour="alerts"))+
ggplot2::geom_step(ggplot2::aes(x=date, y=visits/mean(visits, na.rm=TRUE),colour="venue visits"))+
ggplot2::geom_rect(ggplot2::aes(xmin=date,xmax=dplyr::lead(date), ymin=0, ymax=alerts/mean(alerts, na.rm=TRUE),fill="alerts"), linewidth=0, alpha=0.2)+
ggplot2::geom_rect(ggplot2::aes(xmin=date,xmax=dplyr::lead(date), ymin=0, ymax=visits/mean(visits, na.rm=TRUE),fill="venue visits"), linewidth=0, alpha=0.2)+
ggplot2::coord_cartesian(xlim=as.Date(c("2020-01-01","2023-07-01")))+
ggplot2::ylab("relative frequency")+
ggplot2::xlab(NULL)+
ggplot2::facet_wrap(~"NHS app")+
ggplot2::scale_color_brewer(palette="Dark2", name=NULL, aesthetics = c("fill","colour"))+
ggplot2::scale_x_date(date_breaks="3 months",date_labels = "%b %y")+
ggplot2::theme(legend.position = "bottom")
p1+p2+p3+patchwork::plot_layout(ncol=1)
The weekly alerts data can be also modelled as a poisson process after which it can be converted to a daily timeseries and aligned to the incidence and growth rate data. With a fully aligned and modelled data set we can test for granger causality:
alerts = ggoutbreak::england_nhs_app %>%
dplyr::rename(count = alerts) %>%
poisson_locfit_model() %>%
rescale_model("1 day") %>%
dplyr::mutate(time = as.time_period(time, fit$time)) %>%
dplyr::select(-dplyr::starts_with("growth")) %>%
dplyr::rename_with(.cols = dplyr::starts_with("incidence"), .fn = ~ stringr::str_replace(.x,"incidence","alert")) %>%
dplyr::inner_join(fit, by="time")
lmtest::grangertest(alert.0.5 ~ growth.0.5, order = 1, data=alerts)
#> Granger causality test
#>
#> Model 1: alert.0.5 ~ Lags(alert.0.5, 1:1) + Lags(growth.0.5, 1:1)
#> Model 2: alert.0.5 ~ Lags(alert.0.5, 1:1)
#> Res.Df Df F Pr(>F)
#> 1 947
#> 2 948 -1 281.93 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lmtest::grangertest(growth.0.5 ~ alert.0.5, order = 1, data=alerts)
#> Granger causality test
#>
#> Model 1: growth.0.5 ~ Lags(growth.0.5, 1:1) + Lags(alert.0.5, 1:1)
#> Model 2: growth.0.5 ~ Lags(growth.0.5, 1:1)
#> Res.Df Df F Pr(>F)
#> 1 947
#> 2 948 -1 4.1647 0.04155 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lmtest::grangertest(incidence.0.5 ~ alert.0.5, order = 1, data=alerts)
#> Granger causality test
#>
#> Model 1: incidence.0.5 ~ Lags(incidence.0.5, 1:1) + Lags(alert.0.5, 1:1)
#> Model 2: incidence.0.5 ~ Lags(incidence.0.5, 1:1)
#> Res.Df Df F Pr(>F)
#> 1 947
#> 2 948 -1 1.8558 0.1734
lmtest::grangertest(alert.0.5 ~ incidence.0.5, order = 1, data=alerts)
#> Granger causality test
#>
#> Model 1: alert.0.5 ~ Lags(alert.0.5, 1:1) + Lags(incidence.0.5, 1:1)
#> Model 2: alert.0.5 ~ Lags(alert.0.5, 1:1)
#> Res.Df Df F Pr(>F)
#> 1 947
#> 2 948 -1 1.2816 0.2579
The conclusion of this test would be that growth is more predictive of alerts rather than vice versa.