Skip to contents

Comparisons between epidemic states in different populations separated by for example geography, or in this case age, requires a normalisation by the population size in either absolute or proportional terms. Based on the 2021 census we have some demo data for the demographics of England:

dplyr::glimpse(england_demographics)
#> Rows: 19
#> Columns: 3
#> Groups: class [19]
#> $ class               <fct> 00_04, 05_09, 10_14, 15_19, 20_24, 25_29, 30_34, 3…
#> $ population          <dbl> 3077000, 3348600, 3413100, 3218900, 3414400, 37154…
#> $ baseline_proportion <dbl> 0.054470107, 0.059278063, 0.060419864, 0.056982069…

Incidence Poisson rate model

A plot of the normalised incidence rates of COVID-19 by population size, shows initially the rate of COVID cases was highest in the elderly. By late 2020 this pattern had changed and rates were uniform across age groups. In early 2021 as vaccination took hold and school testing was rolled out, younger age groups had higher rates of COVID positive tests, with a curious spike in the very young age groups around November 2021. In early 2022 the pattern reversed and the elderly again became the age group with the highest rates, and that pattern has persisted until the present.


tmp = ggoutbreak::england_covid %>%
   ggoutbreak::poisson_locfit_model(window=21) %>%
   ggoutbreak::normalise_incidence(ggoutbreak::england_demographics)

raw_pop = ggoutbreak::england_covid %>% 
  ggoutbreak::normalise_count(ggoutbreak::england_demographics, population_unit=1000, normalise_time="1 year")

plot_incidence(tmp,raw = raw_pop, size=0.25)+scale_y_log1p(n=7)+
  ggplot2::scale_colour_viridis_d(aesthetics = c("fill","colour"))

The use of test positives as a proxy for COVID incidence is clearly potentially biased by testing (particularly in the first wave where testing was limited to those in hospital). A more reliable comparison in this situation would be a test positive proportion, but unfortunately testing rates are not published broken down by age.

The exponential growth rate is already normalised by population size. Comparisons of the growth rate in these populations gives an idea of how tightly coupled they are. In most age groups the epidemic is growing and shrinking in sync apart from possibly the very young. COVID detections in this age group were not particularly reliable though and this would be easy to over-interpret.



plot_growth_rate(tmp)+
  ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour"))+
  ggplot2::coord_cartesian(ylim=c(-0.15,0.15))
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.

The combination of growth and normalised incidence allows us to compare the epidemic state at different time points, in this case Christmas day in 2020, 2021 and 2022. This shows the same data as the previous graphs.


plot_growth_phase(tmp,
    timepoints = as.Date(c("Xmas 2020"="2020-12-25","Xmas 2021"="2021-12-25","Xmas 2022"="2022-12-25")),
    duration = 70, 
    interval = 7
)+
  ggplot2::scale_colour_viridis_d()

Proportion model

There are two possible proportions models which would be of interest here. As mentioned about the proportion of positive tests in each age group would give us a clearer picture of whether the differences between age groups are down to differential testing, but unfortunately not available in this data set.

The second potential use is the distribution of ages in the test positive age group. This age distribution gives us some information about where the burden of disease is in the population but is also biased by test prioritisation.

A multinomial proportion shows similar patterns as the normalised incidence plot above:



tmp2 = ggoutbreak::england_covid %>%
   ggoutbreak::proportion_locfit_model(window=21)

p1 = plot_multinomial(tmp2,normalise = TRUE)+
  ggplot2::scale_fill_viridis_d()

p2 = ggplot2::ggplot(england_demographics)+
  ggplot2::geom_bar(ggplot2::aes(x="baseline",y=population/sum(population)*100,fill=class), stat="identity", position="stack", colour="black", linewidth=0.1)+
  ggplot2::scale_fill_viridis_d(guide="none")+
  ggplot2::xlab(NULL)+
  ggplot2::ylab(NULL)+
  ggplot2::theme(axis.text.y = ggplot2::element_blank())+
  ggplot2::coord_cartesian(expand=FALSE)

p1+p2+patchwork::plot_layout(nrow=1,widths = c(20,1),guides = "collect")

The age distribution of test positives can be normalised by the age distribution of the population. This give us a relative proportion of age groups in people testing positive versus expected in the population. This is conceptually a relative risk but of age group given COVID status

i.e. it is P(age=80+|COVID+)P(age=80+)\frac{P(age = 80+|COVID+)}{P(age = 80+)}

At any point in time for a given population this quantity is centred around 1 and comparing it to growth rate gives us a possibly clearer picture of the trajectory of the relative distribution of COVID in the population. In Xmas 2021 although the majority of cases was in the young, relatively high growth in the elderly population meant that they were catching up, and from above we can see that by early 2022 the elderly had highest COVID positive rates. In 2022 however, the separation of the age groups was established and the trajectories were acting to preserve that separation.


tmp3 = tmp2 %>% infer_risk_ratio(england_demographics)

plot_growth_phase(tmp3,
    timepoints = as.Date(c("Xmas 2020"="2020-12-25","Xmas 2021"="2021-12-25","Xmas 2022"="2022-12-25")),
    duration = 70, 
    interval = 7
)+
  ggplot2::scale_colour_viridis_d()
#> Coordinate system already present. Adding new coordinate system, which will
#> replace the existing one.

Pre test probability

Using the ONS COVID-19 infection survey we can look at population prevalence based on random sampling (panel A).

The proportion of people positive in a random sample of the population can be compared to the proportion of people testing positive out of all the people tested. This ratio is an indication of the pre-test probability of disease in the non-random sample group and gives us some idea of how selectively testing is being applied. If testing is only being done on people with symptoms for example then their pre-test probability of disease is higher than the general population. This is shown in panel B, and it also can be interpreted as how biased the sample that is tested for symptoms is.

p1 = ggoutbreak::plot_prevalence(
    ggoutbreak::england_ons_infection_survey %>% dplyr::filter(geography == "England"),
    events = ggoutbreak::england_events
  )+
  ggplot2::theme(
    axis.title.x = ggplot2::element_blank(), 
    axis.text.x = ggplot2::element_blank(), 
    axis.text.x.bottom = ggplot2::element_blank()
  )+
  ggplot2::ylab("ONS prevalence (%)")


p2 = ggoutbreak::england_covid_pcr_positivity %>%
  proportion_locfit_model() %>%
  dplyr::inner_join(ggoutbreak::england_ons_infection_survey %>% 
                      dplyr::filter(geography=="England") %>% 
                      dplyr::rename_with(.cols = tidyselect::starts_with("prevalence"), .fn = ~stringr::str_replace(.x,"prevalence","ons")), by=c("time")
  ) %>%
  dplyr::transmute(date=date, pre_test_odds = proportion.0.5 / ons.0.5) %>%
  ggplot2::ggplot(ggplot2::aes(x=date,y=pre_test_odds)) + ggplot2::geom_line() +
  ggplot2::geom_hline(yintercept=1, colour="grey40",linetype="dashed") +
  ggplot2::ylab("Pre-test odds ratio") +
  ggoutbreak::geom_events(events = ggoutbreak::england_events,hide_labels = TRUE)+
  ggplot2::theme(
    axis.title.x = ggplot2::element_blank(), 
    axis.text.x = ggplot2::element_blank(), 
    axis.text.x.bottom = ggplot2::element_blank(), 
    axis.text.x.top = ggplot2::element_blank()
  )

Likewise the incidence of cases per capita can be compared to the prevalence. A prevalence-like estimate can be made from the incidence by convolving the incidence estimate with a delay distribution that represents the duration of infection. Duration of infection could be represented either by the sensitivity of a test to detect an infection as a function of time after infection, or as the probability of successful viral culture as a function of time after symptom onset.

In this case the ratio of prevalence derived from positive case counts, resulting from symptomatic testing, and prevalence measured directly on a random sample by the ONS survey has a connection with the ascertainment rate. The actual value of the ascertainment rate calculated by this method depends on assumptions on the duration of infection, however the relative change in this number is informative (panel C).

england_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population))

p3 = ggoutbreak::england_covid_pcr_positivity %>% 
  ggoutbreak::poisson_locfit_model() %>%
  # N.B. `covid_viral_shedding` is technically from symptom onset not infection.
  # We could argue that this is a good thing as case positivity is closer in time
  # to symptom onset however this is forgetting about all the cases that would
  # test positive if they had been tested. An alternative to this would have been 
  # to use `covid_test_sensitivity` which looks at probability of detection from 
  # time of infection but it is only in cases with proven infection so a biased 
  # prior.
  ggoutbreak::infer_prevalence(pop = england_pop, ip = covid_viral_shedding) %>%
  dplyr::inner_join(england_ons_infection_survey %>% 
                      dplyr::filter(geography=="England") %>% 
                      dplyr::rename_with(.cols = tidyselect::starts_with("prevalence"), .fn = ~stringr::str_replace(.x,"prevalence","ons")), by=c("time")
  ) %>%
  dplyr::transmute(date=date, ascertainment = prevalence.0.5 / ons.0.5 * 100) %>%
  ggplot2::ggplot(ggplot2::aes(x=date,y=ascertainment)) + 
  ggplot2::geom_line() + 
  ggplot2::ylab("Ascertainment rate (%)") +
  ggoutbreak::geom_events(events = ggoutbreak::england_events,hide_labels = TRUE)


p1+p2+p3+patchwork::plot_layout(ncol=1)+patchwork::plot_annotation(tag_levels="A")