{epiprocess}
& {epipredict}
R
packages to ramp up forecasting systemsStanford STATS/BIODS 352 — 12 April 2023
Covid-19 Pandemic required quickly implementing forecasting systems.
Basic processing—outlier detection, reporting issues, geographic granularity—implemented in parallel / error prone
Data revisions complicate evaluation
Simple models often outperformed complicated ones
Custom software not easily adapted / improved by other groups
Hard for public health actors to borrow / customize community techniques
{epiprocess}
{epiprocess}
Data Structuresepi_df
: snapshot of a data setgeo_value
and time_value
.age_group
, ethnicity
, etc.)epi_df
Represents a snapshot that contains the most up-to-date values of the signal variables, as of a given time.
{epiprocess}
Data Structuresepi_archive
: collection of epi_df
sepi_df
sepi_df
but based on the data that “would have been available at the time”Revisions
Epidemiology data gets revised frequently. (Happens in Economics as well.)
{epipredict}
A very specialized plug-in to {tidymodels}
Suppose we want to predict new hospitalizations \(y\), \(h\) days ahead, at many locations \(j\).
We’re going to make a new forecast each week.
For each location, predict \[\hat{y}_{j,i+h} = y_{j,i}\]
Use an AR model with some covariates, for example: \[\hat{y}_{j,i+1} = \mu + a_0 y_{j,i} + a_7 y_{j,i-7} + b_0 x_{j,i} + b_7 x_{j,i-7}\]
{epipredict}
{epipredict}
You can do a limited amount of customization.
We currently provide:
death_rate
, 1 week ahead, with 0,7,14
day lags of cases
and deaths
.lm
for estimation. Also create “intervals”.The output is a model object that could be reused in the future, along with the predictions for 7 days from now.
rf <- arx_forecaster(
epi_data = jhu,
outcome = "death_rate",
predictors = c("case_rate", "death_rate", "fb-survey"),
trainer = parsnip::rand_forest(mode = "regression"), # use ranger
args_list = arx_args_list(
ahead = 14, # 2-week horizon
lags = list(c(0:4, 7, 14), c(0, 7, 14), c(0:7, 14)), # bunch of lags
levels = c(0.01, 0.025, 1:19/20, 0.975, 0.99), # 23 ForecastHub quantiles
quantile_by_key = "geo_value" # vary q-forecasts by location
)
)
# A preprocessing "recipe" that turns raw data into features / response
r <- epi_recipe(jhu) %>%
step_epi_lag(case_rate, lag = c(0, 1, 2, 3, 7, 14)) %>%
step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%
step_epi_ahead(death_rate, ahead = 14) %>%
step_epi_naomit()
# A postprocessing routine describing what to do to the predictions
f <- frosting() %>%
layer_predict() %>%
layer_threshold(.pred, lower = 0) %>% # predictions/intervals should be non-negative
layer_add_target_date(target_date = max(jhu$time_value) + 14) %>%
layer_add_forecast_date(forecast_date = max(jhu$time_value))
# Bundle up the preprocessor, training engine, and postprocessor
# We use quantile regression
ewf <- epi_workflow(r, quantile_reg(tau = c(.1, .5, .9)), f)
# Fit it to data (we could fit this to ANY data that has the same format)
trained_ewf <- ewf %>% fit(jhu)
# examines the recipe to determine what we need to make the prediction
latest <- get_test_data(r, jhu)
# we could make predictions using the same model on ANY test data
preds <- trained_ewf %>% predict(new_data = latest)
Long book on {epipredict}
, in progress…
{epiprocess}
& {epipredict}
— dajmcdon.github.io/epitooling-stanford-2023