`{epiprocess}` & `{epipredict}`

`R` packages to ramp up forecasting systems

Daniel J. McDonald, Ryan J. Tibshirani, Logan C. Brooks

and CMU’s Delphi Group

Stanford STATS/BIODS 352 — 12 April 2023

Background

Covid-19 Pandemic required quickly implementing forecasting systems.
Basic processing—outlier detection, reporting issues, geographic granularity—implemented in parallel / error prone
Data revisions complicate evaluation
Simple models often outperformed complicated ones
Custom software not easily adapted / improved by other groups
Hard for public health actors to borrow / customize community techniques

`{epiprocess}`

Basic processing operations and data structures

General EDA for “panel data”
Calculate rolling statistics
Fill / impute gaps
Examine correlations
Store revision history smartly
Inspect revision patterns
Find / correct outliers

`{epiprocess}` Data Structures

`epi_df`: snapshot of a data set

a tibble with a couple of required columns, geo_value and time_value.
arbitrary additional columns containing “measured” values, called “signals”
additional “keys” that index subsets (age_group, ethnicity, etc.)

epi_df

Represents a snapshot that contains the most up-to-date values of the signal variables, as of a given time.

`{epiprocess}` Data Structures

`epi_archive`: collection of `epi_df`s

full version history of a data set
acts like a bunch of epi_dfs
but stored “compactly”
Allows you to do things you would do on an epi_df but based on the data that “would have been available at the time”

Revisions

Epidemiology data gets revised frequently. (Happens in Economics as well.)

We may want to use the data “as it looked in the past”
or we may want to examine “the history of revisions”.

Revision patterns

Simple sliding computations

dav14 <- jhu_csse_daily_subset %>%
  group_by(geo_value) %>%
  epi_slide(cases_14dav = mean(cases), n = 14)

`{epipredict}`

+ Framework for customizing from modular components.

Preprocessor: do things to the data before model training
Trainer: train a model on data, resulting in an object
Predictor: make predictions, using a fitted model object
Postprocessor: do things to the predictions before returning

A very specialized plug-in to {tidymodels}

Making dumb (but useful!) forecasts in epidemiology

Suppose we want to predict new hospitalizations \(y\), \(h\) days ahead, at many locations \(j\).
We’re going to make a new forecast each week.

Flatline forecaster

For each location, predict \[\hat{y}_{j,i+h} = y_{j,i}\]

AR forecaster

Use an AR model with some covariates, for example: \[\hat{y}_{j,i+1} = \mu + a_0 y_{j,i} + a_7 y_{j,i-7} + b_0 x_{j,i} + b_7 x_{j,i-7}\]

`{epipredict}`

A forecasting framework

Flatline forecaster
AR-type models
Backtest using the versioned data
Easily create features
Quickly pivot to new tasks
Highly customizable for advanced users

`{epipredict}`

Canned forecasters that work out of the box.

You can do a limited amount of customization.

We currently provide:

Baseline flat-line forecaster
Autoregressive-type forecaster
Autoregressive-type classifier

Basic autoregressive forecaster

Predict death_rate, 1 week ahead, with 0,7,14 day lags of cases and deaths.
Use lm for estimation. Also create “intervals”.

library(epipredict)
jhu <- case_death_rate_subset # grab some built-in data
canned <- arx_forecaster(
  epi_data = jhu, 
  outcome = "death_rate", 
  predictors = c("case_rate", "death_rate")
)

The output is a model object that could be reused in the future, along with the predictions for 7 days from now.

Adjust lots of built-in options

rf <- arx_forecaster(
  epi_data = jhu, 
  outcome = "death_rate", 
  predictors = c("case_rate", "death_rate", "fb-survey"),
  trainer = parsnip::rand_forest(mode = "regression"), # use ranger
  args_list = arx_args_list(
    ahead = 14, # 2-week horizon
    lags = list(c(0:4, 7, 14), c(0, 7, 14), c(0:7, 14)), # bunch of lags
    levels = c(0.01, 0.025, 1:19/20, 0.975, 0.99), # 23 ForecastHub quantiles
    quantile_by_key = "geo_value" # vary q-forecasts by location
  )
)

Do (almost) anything manually

# A preprocessing "recipe" that turns raw data into features / response
r <- epi_recipe(jhu) %>%
  step_epi_lag(case_rate, lag = c(0, 1, 2, 3, 7, 14)) %>%
  step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
  step_epi_ahead(death_rate, ahead = 14) %>%
  step_epi_naomit()

# A postprocessing routine describing what to do to the predictions
f <- frosting() %>%
  layer_predict() %>%
  layer_threshold(.pred, lower = 0) %>% # predictions/intervals should be non-negative
  layer_add_target_date(target_date = max(jhu$time_value) + 14) %>%
  layer_add_forecast_date(forecast_date = max(jhu$time_value))

# Bundle up the preprocessor, training engine, and postprocessor
# We use quantile regression
ewf <- epi_workflow(r, quantile_reg(tau = c(.1, .5, .9)), f)

# Fit it to data (we could fit this to ANY data that has the same format)
trained_ewf <- ewf %>% fit(jhu)

# examines the recipe to determine what we need to make the prediction
latest <- get_test_data(r, jhu)

# we could make predictions using the same model on ANY test data
preds <- trained_ewf %>% predict(new_data = latest)

Pivot to some online examples

Long book on {epipredict}, in progress…

Packages are under active development

Thanks:

The whole CMU Delphi Team (across many institutions)
Optum/UnitedHealthcare, Change Healthcare.
Google, Facebook, Amazon Web Services.
Quidel, SafeGraph, Qualtrics.
Centers for Disease Control and Prevention.
Council of State and Territorial Epidemiologists

{epiprocess} & {epipredict}

R packages to ramp up forecasting systems

Daniel J. McDonald, Ryan J. Tibshirani, Logan C. Brooks

and CMU’s Delphi Group

Background

{epiprocess}

Basic processing operations and data structures

{epiprocess} Data Structures

epi_df: snapshot of a data set

{epiprocess} Data Structures

epi_archive: collection of epi_dfs

Revision patterns

Simple sliding computations

{epipredict}

+ Framework for customizing from modular components.

Making dumb (but useful!) forecasts in epidemiology

Flatline forecaster

AR forecaster

{epipredict}

A forecasting framework

{epipredict}

Canned forecasters that work out of the box.

Basic autoregressive forecaster

Adjust lots of built-in options

Do (almost) anything manually

Pivot to some online examples

Packages are under active development

Thanks:

`{epiprocess}` & `{epipredict}`

`R` packages to ramp up forecasting systems

`{epiprocess}`

`{epiprocess}` Data Structures

`epi_df`: snapshot of a data set

`{epiprocess}` Data Structures

`epi_archive`: collection of `epi_df`s

`{epipredict}`

`{epipredict}`

`{epipredict}`