Statistical approaches to epidemic forecasting

Evaluation and software
Daniel J. McDonald
University of British Columbia
10 February 2023

Mathematical modelling of disease / epidemics is very old

* Daniel Bernoulli (1760) - studies inoculation against smallpox
* John Snow (1855) - cholera epidemic in London tied to a water pump
* Ronald Ross (1902) - Nobel Prize in Medicine for work on malaria
* Kermack and McKendrick (1927-1933) - basic epidemic (mathematical) model

Forecasting is also old, but not that old

### US CDC Flu Challenge began in 2013

> CDC's Influenza Division has collaborated each flu season with external researchers on flu forecasting. <br>CDC has provided forecasting teams data, relevant public health forecasting targets, and forecast accuracy metrics while teams submit their forecasts, which are based on a variety of methods and data sources, each week.

The Covid-19 Pandemic

* CDC pivoted their Flu Challenge to Covid-19 in June 2020
* Similar efforts in Germany and then Europe
* Nothing similar for Canada
* CDC now has Flu and Covid simultaneously

Why the Forecast Hubs?

* Collect public forecasts in a standard format and visualize
* Used internally by CDC
* Turns out, most individual teams' forecasts are ... not great
* Combine submissions into an "Ensemble"

# Outline

0. History and background
1. Standard forecasting models.
2. Forecast evaluation
3. Gaps in forecasting
4. Some lessons learned
5. Software for the community

# Standard forecasting models
] --- ## SIR-type (compartmental) models - Stochastic Version .pull-left-wide[ ### Over-all equations: `\begin{aligned} C(t+h) & = \mathrm{Binom}\left(S(t),\ \frac{\beta}{N} h I(t)\right)\\ D(t+h) & = \mathrm{Binom}\left(I(t),\ \gamma h\right)\\ S(t+h) & = S(t) - C(t+h)\\ I(t+h) & = I(t) + C(t+h) - D(t+h)\\ R(t+h) & = R(t) + D(t+h) \end{aligned}` ### In the deterministic limit, `\(N\rightarrow\infty,\ h\rightarrow 0\)` `\begin{aligned} N &= S(0) + I(0) + R(0)\\ \frac{dS}{dt} & = -\frac{\beta}{N} S(t)I(t)\\ \frac{dI}{dt} & = \frac{\beta}{N} I(t)S(t) - \gamma I(t)\\ \frac{dR}{dt} & = \gamma I(t) \end{aligned}` **"_the_ SIR model"** is often ambiguous between these ] .pull-right-narrow[
] --- ## Data issues - .tertiary[Ideally] we'd see .large[.primary[S], .secondary[I], .tertiary[R]] at all times - Easier to observe new infections, .large[.secondary[I(t+h) - I(t)]] - Removals by death are easier to observe than removals by recovery, so we mostly see .large[.tertiary[(R(t+h) - R(t))] × (death rate)] - The interval between measurements, say `\(\Delta\)`, is often `\(\gg h\)` - Measuring .large[.secondary[I(t)]] and .large[.tertiary[R(t)]] (or their rates of change) is hard + testing/reporting is sporadic and error prone + Need to model test error (false positives, false negatives) _and_ who gets tested + Need to model lag between testing and reporting - Parameters (especially, `\(\beta\)`) change during the epidemic + Changing behavior, changing policy, environmental factors, vaccines, variants, ... --- ## Connecting to Data - Likelihood calculations are straightforward if we can measure .large[.secondary[I(t)], .tertiary[R(t)]] at all times 0, h, 2h, … T - Or .large[.secondary[I(0)]], .large[.tertiary[R(0)]] and all the increments .large[.secondary[I(t+h) - I(t)], .tertiary[R(t+h) - R(t)]] - Still have to optimize numerically - Likelihood calculations already become difficult if the time between observations `\(\Delta \gg h\)` + Generally, `\(\Delta \approx\)` 1 day + In principle, this just defines another Markov process, with a longer interval `\(\Delta\)` between steps, but to get the likelihood of a `\(\Delta\)` step we have to sum over all possible paths of `\(h\)` steps adding up to it - Other complications if we don't observe all the compartments, and/or have a lot of noise in our observations + We don't and we do. --- ## Connecting to Data .pull-left[ - Often more tractable to avoid likelihood (Conditional least squares, simulation-based inference) - Intrinsic issue: Initially, everything just looks exponential + So it's hard to discriminate between distinct models + So even assuming an SIR model, it's easier to estimate `\(\beta - \gamma\)` than `\((\beta, \gamma)\)` or `\(\beta/\gamma\)` - Can sometimes **calibrate** or fix the parameters based on other sources + E.g., `\(1/\gamma =\)` average time someone is infectious, which could be determined from clinical studies / observations ] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">I have been thinking about how different people interpret data differently. And made this xkcd style graphic to illustrate this. <a href=""></a></p>— Jens von Bergmann (@vb_jens) <a href="">March 17, 2021</a></blockquote> ] --- ## These models fit well "in sample" .pull-left[ * Track observed cases closely (they should) * Can provide nuanced policy advice on some topics * Many questions depend on modulating `\(\beta\)` 1. What happens if we lock down? 2. What happens if we mask? 3. What happens if we have school online? 4. Vaccine passport? * Vaccination modeling is easier, directly removes susceptibles * What about out-of-sample? ] .pull-right[.center[
Width of interval.
Under prediction (AE to the top)
Over prediction (AE to the bottom) Width of interval. 1. Under prediction (AE to the top) 1. Data types
* `epi_df` — basically a `data.frame` but with important meta information
1. `as_of` tag to denote the data vintage
2. time type
3. geo type
4. additional keys (e.g., age, gender, race)
* `epi_archive` — a collection of `epi_df`s of different vintages
* But stored so as to eliminate redundancies
* Allows for important operations (filling, merging, snapshots, )
Fundamental functionality — the `slide()`
* Rolling correlations
* Moving averages
* outlier correction
* arbitrary functions Data types * `epi_df` — basically a `data.frame` but with important meta information 1. `as_of` tag to denote the data vintage 2. time type 3. geo type 4. additional keys (e.g., age, gender, race) * `epi_archive` — a collection of `epi_df`s of different vintages * But stored so as to eliminate redundancies * Allows for important operations (filling, merging, snapshots, ) 2. We currently provide:
- Baseline flat-line forecaster
- Autoregressive forecaster (not an "AR" model, you don't want this)
- Autoregressive classifier

### A framework for creating custom forecasters out of modular components.

1. Preprocessor: do things to the data before model training
2. Trainer: train a model on data, resulting in a fitted model object
3. Predictor: make predictions, using a fitted model object
4. Postprocessor: do things to the predictions before returning

A very specialized plug-in to `{tidymodels}` * **Data is trouble**
1. We've spent lots of time trying to make data available
2. Even when Public Health doesn't cooperate
3. Nowcasting
4. Low SNR
5. Very nonstationary
* **Forecast evaluation is not settled**
1. Not optimized to predict turning points.
2. How do we create ensembles?
3. Better to predict squishy concepts: "surge", "upswing", "big upswing"?

* **Hugely important to backtest properly**
1. Data is constantly revised
2. We see up to 10% "improvement" if we use finalized data
* **Understanding the spatio-temporal dynamics is open**
1. How do "waves" propagate?
2. How important is mixing?
3. Seasonality?
4. Effects of NPIs?
* **True Counterfactual causal inference is open**
* **We have a massive survey dataset**

[We're building software so others can use it.]

Many thanks

* Ryan, Logan, Addison, Rob, Larry, Valerie, Alden, and the rest of [Delphi](
* Jens, Dean, Dan, Sally and [BC COVID Modelling group](
* Funding to me from NSERC, CANSSI
* I get to benefit from the results of funding from Google, Facebook, Amazon, Change Healthcare, Quidel, SafeGraph, Qualtrics, CDC, CSTE
* Forecast evaluation from [Reich Lab]( and [Delphi](

Questions?
On these or other issues. We've spent lots of time trying to make data available 2. Even when Public Health doesn't cooperate 3. Nowcasting 4. Low SNR 5. Very nonstationary * **Forecast evaluation is not settled** 1. Not optimized to predict turning points. 2. How do we create ensembles? 3. Better to predict squishy concepts: "surge", "upswing", "big upswing"? ] .pull-right[ * **Hugely important to backtest properly** 1. Data is constantly revised 2. We see up to 10% "improvement" if we use finalized data * **Understanding the spatio-temporal dynamics is open** 1. How do "waves" propagate? 2. How important is mixing? 3. Seasonality? 4. Effects of NPIs? * **True Counterfactual causal inference is open** * **We have a massive survey dataset** ] <hr>[We're building software so others can use it.] --- ## Many thanks * Ryan, Logan, Addison, Rob, Larry, Valerie, Alden, and the rest of [Delphi]( * Jens, Dean, Dan, Sally and [BC COVID Modelling group]( * Funding to me from NSERC, CANSSI * I get to benefit from the results of funding from Google, Facebook, Amazon, Change Healthcare, Quidel, SafeGraph, Qualtrics, CDC, CSTE * Forecast evaluation from [Reich Lab]( and [Delphi]( <hr> <br><br> .center[ .tertiary.large[Questions?] On these or other issues. ]