Skip to contents

Robert Challen 1,2; Leon Danon 1,2;

  1. Engineering Mathematics, University of Bristol, Bristol, UK
  2. AI4CI

Introduction

If we have estimated the incidence of a disease ItI_t using a poisson rate using maximum likelihood estimators, the rate is typically a log-normally distributed with parameters μ\mu and σ\sigma. Such a fitted model is shown below on a log1p scale, for the COVID-19 epidemic in England:

It is appealing to use this modelled incidence estimate to calculate an estimate of the reproduction number, RtR_t. Incidence models can be derived in a number of ways, they are easily inspected for error and can be made tolerant of missing values and outliers.

Methods

To use a modelled estimate of incidence to predict RtR_t we need to propagate uncertainty in incidence into our RtR_t estimates. To calculate RtR_t we can use the backwards-looking renewal equations which incorporate the infectivity profile of the disease (ω\omega) at a number of days after infection (τ\tau):

ItLognormal(μt,σt)Rt=ItτωτItτ \begin{align} I_t &\sim Lognormal(\mu_t,\sigma_t) \\ R_t &= \frac{I_t}{\sum_{\tau}{\omega_{\tau}I_{t-\tau}}} \end{align}

giving us:

Rt=Lognormal(μt,σt)τLognormal(μtτ+log(ωτ),σtτ) \begin{align} R_t = \frac{Lognormal(\mu_t,\sigma_t)}{\sum_{\tau}{ Lognormal( \mu_{t-\tau} + log(\omega_{\tau}) , \sigma_{t-\tau}) }} \end{align}

The sum of ii such log normal distributions can be approximated by another log normal (Lo 2013) with parameters μZ\mu_Z and σZ\sigma_Z.

S+=E[iXi]=iE[Xi]=ieμi+12σi2σZ2=1S+2i,jcorijσiσjE[Xi]E[Xj]=1S+2i,jcorijσiσjeμi+12σi2eμj+12σj2μZ=ln(S+)12σZ2 \begin{align} S_+ &= \operatorname{E}\left[\sum_i X_i \right] = \sum_i \operatorname{E}[X_i] = \sum_i e^{\mu_i + \frac{1}{2}\sigma_i^2} \\ \sigma^2_{Z} &= \frac{1}{S_+^2} \, \sum_{i,j} \operatorname{cor}_{ij} \sigma_i \sigma_j \operatorname{E}[X_i] \operatorname{E}[X_j] = \frac{1}{S_+^2} \, \sum_{i,j} \operatorname{cor}_{ij} \sigma_i \sigma_j e^{\mu_i+\frac{1}{2}\sigma_i^2} e^{\mu_j+\frac{1}{2}\sigma_j^2} \\ \mu_Z &= \ln\left( S_+ \right) - \frac{1}{2}\sigma_{Z}^2 \end{align}

The sum term in the denominator of the renewal equations consists of a set of correlated scaled log normal distributions with both scale and correlation defined by the infectivity profile (ω\omega). In our case corijcor_{ij} can be equated to the infectivity profile (ω|ij|\omega_{|i-j|}) when iji \neq j and 1 when i=ji = j. μi\mu_i is μtτ+ln(ωτ)\mu_{t-\tau} + ln(\omega_{\tau}).

St=s=1|ω|ωseμts+12σts2σZ,t=i,j=1|ω|(ω|ij|+I(i,j))ωiωj(σ(ti)eμ(ti)+12σ(ti)2)(σ(tj)eμ(tj)+12σ(tj)2)St2μZ,t=log(St)12σZ,t2 \begin{align} S_{t} &= \sum_{s=1}^{|\omega|} { \omega_s e^{\mu_{t-s} + \frac{1}{2}\sigma_{t-s}^2 }} \\ \sigma_{Z,t} &= \sqrt{ \frac{ \sum_{i,j=1}^{|\omega|} { (\omega_{|i-j|}+I(i,j)) \omega_i \omega_j (\sigma_{(t-i)} e^{\mu_{(t-i)}+\frac{1}{2}\sigma_{(t-i)}^2}) (\sigma_{(t-j)} e^{\mu_{(t-j)}+\frac{1}{2}\sigma_{(t-j)}^2}) } }{S_{t}^2} } \\ \mu_{Z,t} &= \log\left( S_{t} \right) - \frac{1}{2}\sigma_{Z,t}^2 \end{align}

μ\mu is the central estimate of case counts on the log scale, and its standard deviation can also be large. There are numerical stability issues dealing with terms involving e(μ+σ2)e^{(\mu+\sigma^2)}, however keeping everything in log space and using optimised log-sum-exp functions this can be made computationally tractable.

log(St)=log(s=1|ω|eμts+12σts2+log(ωs))log(Tt,τ)=log(ωτ)+log(σ(tτ))+μ(tτ)+12σ(tτ)2)log(cori,j)=log(ω|ij|+I(i=j))log(σZ,t2)=log(i,j=1|ω|elog(cori,j)+log(Tt,i)+log(Tt,j))2log(St)μZ,t=log(St)12σZ,t2 \begin{align} \log(S_{t}) &= \log(\sum_{s=1}^{|\omega|} { e^{\mu_{t-s} + \frac{1}{2}\sigma_{t-s}^2 + \log(\omega_s) }}) \\ \log(T_{t,\tau}) &= \log(\omega_{\tau}) + \log(\sigma_{(t-{\tau})}) + \mu_{(t-{\tau})} + \frac{1}{2}\sigma_{(t-{\tau})}^2) \\ \log(cor_{i,j}) &= \log(\omega_{|i-j|}+I(i=j)) \\ \log(\sigma_{Z,t}^2) &= \log( \sum_{i,j=1}^{|\omega|} { e^{ \log(cor_{i,j}) + \log(T_{t,i}) + \log(T_{t,j}) } }) - 2 \log(S_{t}) \\ \mu_{Z,t} &= \log( S_{t} ) - \frac{1}{2}\sigma_{Z,t}^2 \end{align}

N.B. if we assume the individual estimates of the incidence are uncorrelated this simplifies to:

log(σZ,t2)=log(τ=1|ω|e2log(Tt,τ))2log(St) \begin{align} \log(\sigma_{Z,t}^2) &= \log( \sum_{\tau=1}^{|\omega|} { e^{ 2 \log(T_{t,\tau}) } }) - 2 \log(S_{t}) \end{align}

Empirically there is not a huge amount of difference in estimates between these two forms. If the infectivity profile ω\omega is spread out over a large period then the correlation matrix will be O(ω)2O(\omega)^2 which may predicate this simpler order 1 formulation.

With μZ,t\mu_{Z,t} and σZ,t\sigma_{Z,t} we are left with the final derivation of RtR_t, giving us a distributional form of RtR_t incorporating uncertainty from modelled incidence estimates:

Rt=Lognormal(μt,σt)Lognormal(μZ,t,σZ,t)μRt=μtμZ,tσRt=σt2+σz,t2Rt=Lognormal(μRt,σRt) \begin{align} R_t &= \frac{Lognormal(\mu_t,\sigma_t)} {Lognormal( \mu_{Z,t}, \sigma_{Z,t})} \\ \mu_{R_t} &= \mu_t - \mu_{Z,t} \\ \sigma_{R_t} &= \sqrt{\sigma_t^2+\sigma_{z,t}^2} \\ R_t &= Lognormal(\mu_{R_t}, \sigma_{R_t}) \end{align}

This is conditioned on a single known infectivity profile (ω\omega). In reality there is also uncertainty in the infectivity profile, however we cannot assume any particular distributional form for this. We can use a set of empirical estimates of the infectivity profile (Ω\Omega) to calculate multiple distributional estimates for the reproduction number (Rt,ωR_{t,\omega}) and then combine these as a mixture distribution.

There is not much we can say about this mixture distribution, as it will depend on the nature of the various empirical infectivity profile distributions. However, we can use general properties of mixture distributions to generate estimates for the mean and variance of the reproduction number including uncertainty arising from multiple infection profile estimates (Rt*R_t^*):

E[Rt|ω]=e(μRt,ω12σRt,ω2)V[Rt|ω]=[e(σRt,ω2)1][e2μRt,ω+σRt,ω2]E[Rt*]=1|Ω|ωΩE[Rt|ω]V[Rt*]=1|Ω|(ωΩV[Rt|ω]+E[Rt|ω]2)E[Rt*]2 \begin{align} E[R_t|\omega] &= e^{(\mu_{R_t,\omega} - \frac{1}{2}\sigma_{R_t,\omega}^2)} \\ V[R_t|\omega] &= \big[ e^{(\sigma_{R_t,\omega}^2)} - 1 \big] \big[ e^{2 \mu_{R_t,\omega} + \sigma_{R_t,\omega}^2} \big] \\ E[R_t^*] &= \frac{1}{|\Omega|}\sum_{\omega \in \Omega} E[{R_t|\omega}] \\ V[R_t^*] &= \frac{1}{|\Omega|} \bigg(\sum_{\omega \in \Omega}{V[R_t|\omega]+E[R_t|\omega]^2}\bigg) - E[R_t^*]^2 \\ \end{align}

The cumulative distribution function of the mixture is simply the arithmetic mean of the component cumulative distribution functions (conditioned on each infectivity profile). If Φ\Phi is the cumulative distribution function of the standard normal distribution:

FRt*(x)=1|Ω|ωΩFRt(x|ω)P(Rt*x)=1|Ω|ωΩP(Rt,ωx)P(Rt*x)=1|Ω|ωΩΦ(ln(x)μRt,ωσRt,ω) \begin{align} F_{R_t^*}(x) &= \frac{1}{|\Omega|}\sum_{\omega \in \Omega}F_{R_t}(x|\omega) \\ P(R_t^* \le x) &= \frac{1}{|\Omega|}\sum_{\omega \in \Omega} P(R_{t,\omega} \le x) \\ P(R_t^* \le x) &= \frac{1}{|\Omega|}\sum_{\omega \in \Omega} \Phi\bigg(\frac{ln(x) - \mu_{R_t,\omega}}{\sigma_{R_t,\omega}}\bigg) \end{align}

As the cumulative density function of this mixture distribution is a strictly increasing function specific solutions for median and 95% confidence intervals can be calculated numerically by solving the following equations:

1|Ω|ωΩΦ(ln(q0.025)μRt,ωσRt,ω)0.025=01|Ω|ωΩΦ(ln(q0.5)μRt,ωσRt,ω)0.5=01|Ω|ωΩΦ(ln(q0.975)μRt,ωσRt,ω)0.975=0 \begin{align} \frac{1}{|\Omega|}\sum_{\omega \in \Omega} \Phi\bigg(\frac{ln(q_{0.025}) - \mu_{R_t,\omega}}{\sigma_{R_t,\omega}}\bigg) - 0.025 &= 0 \\ \frac{1}{|\Omega|}\sum_{\omega \in \Omega} \Phi\bigg(\frac{ln(q_{0.5}) - \mu_{R_t,\omega}}{\sigma_{R_t,\omega}}\bigg) - 0.5 &= 0 \\ \frac{1}{|\Omega|}\sum_{\omega \in \Omega} \Phi\bigg(\frac{ln(q_{0.975}) - \mu_{R_t,\omega}}{\sigma_{R_t,\omega}}\bigg) - 0.975 &= 0 \end{align}

Implementation

This method is implemented using the following R function, which is designed for numerical stability and speed. Generating RtR_t estimates given modelled incidence typically occurring in a matter of seconds:

#> function (mu, sigma, omega, mu_t, sigma_t, cor = FALSE, approx = TRUE) 
#> {
#>     omega_m = as.matrix(omega)
#>     omega_m = apply(omega_m, MARGIN = 2, rev)
#>     tmp = apply(omega_m, MARGIN = 2, function(omega) {
#>         log_S_t = .logsumexp(mu_t + sigma_t^2/2 + log(omega))
#>         log_T_t_tau = mu_t + sigma_t^2/2 + log(omega) + log(sigma_t)
#>         if (cor) {
#>             n = length(omega)
#>             idx = 0:(n^2 - 1)
#>             i = idx%/%n
#>             j = idx%%n
#>             log_cor_ij = c(0, log(omega))[abs(i - j) + 1]
#>             log_var_Zt_ij = log_cor_ij + log_T_t_tau[i + 1] + 
#>                 log_T_t_tau[j + 1]
#>         }
#>         else {
#>             log_var_Zt_ij = 2 * log_T_t_tau
#>         }
#>         log_var_Zt = .logsumexp(log_var_Zt_ij) - 2 * log_S_t
#>         var_Zt = exp(log_var_Zt)
#>         mu_Zt = log_S_t - var_Zt/2
#>         mu_Rt = mu - mu_Zt
#>         var_Rt = sigma^2 + var_Zt
#>         return(c(mu_Rt, var_Rt))
#>     })
#>     mu_Rt = tmp[1, ]
#>     var_Rt = tmp[2, ]
#>     sigma_Rt = sqrt(var_Rt)
#>     means = exp(mu_Rt + var_Rt/2)
#>     vars = (exp(var_Rt) - 1) * exp(2 * mu_Rt + var_Rt)
#>     mean_star = mean(means)
#>     var_star = mean(vars + means^2) - mean_star^2
#>     out = tibble::tibble(rt.fit = log(mean_star^2/sqrt(mean_star^2 + 
#>         var_star)), rt.se.fit = sqrt(log(1 + var_star/mean_star^2)))
#>     if (approx) {
#>         out = out %>% .result_from_fit(type = "rt", qfn = function(p) stats::qlnorm(p, 
#>             .$rt.fit, .$rt.se.fit)) %>% .keep_cdf(type = "rt", 
#>             meanlog = .$rt.fit, sdlog = .$rt.se.fit)
#>     }
#>     else {
#>         out = out %>% .result_from_fit(type = "rt", qfn = function(p) .qmixlnorm(p, 
#>             mu_Rt, sigma_Rt)) %>% .keep_cdf(type = "rt", meanlog = list(mu_Rt), 
#>             sdlog = list(sigma_Rt))
#>     }
#>     return(out)
#> }
#> <bytecode: 0x62b552ae74e8>
#> <environment: namespace:ggoutbreak>

Results

Testing this against the incidence model shown above, and comparing the results to the SPI-M-O consensus RtR_t estimates gives us the following time-series for England. This is formally evaluated elsewhere but qualitatively is a good fit. It took a few seconds to calculate the reproduction number from this single time series with 1410 time points , which opens up the possibility of performing RtR_t estimates in fine grained geographical or demographic subgroups.

#>    user  system elapsed 
#>   6.344   0.003   6.347

Conclusion

We present a methodology for deriving RtR_t from modelled estimates of incidence while propagating uncertainty. We demonstrate it produces satisfactory qualitative results against COVID-19 data. This method is relatively quick, fully deterministic, and can be used on top of any statistical models for estimating incidence which use logarithmic link functions.