install.packages("remotes")
install.packages("tidyverse")
install.packages("tidymodels")
install.packages("glmnet")
remotes::install_github("cmu-delphi/epidatr")
remotes::install_github("cmu-delphi/epidatasets")
remotes::install_github("cmu-delphi/epiprocess@dev")
remotes::install_github("cmu-delphi/epipredict@dev")
remotes::install_github("dajmcdon/rtestim")Processing and Forecasting with Epidemic Surveillance Data
CANSSI Prairies Workshop 2025
The goal of this worksheet is to practice some of the techniques discussed during the lectures. The plan is to take a “phased” approach, with 15-30 minutes of work after each of the four lectures. But the idea is to roughly continue on the same problem.
Computer setup
The following are packages needed for this worksheet. With luck, you would need only a few more (perhaps none!) to build all the slides in this workshop. But I make no guarantees.
Note that both {epiprocess} and {epipredict} are set to install from the development branch rather than from the main branch. So this is the most up-to-date version, but it is also potentially unstable.
The best place for package documentation is typically the website rather than the R help files. So some useful links are here:
Lecture 1: Introduction to Panel Data in Epidemiology
Let’s examine two different sources of versioned panel data.
Source 1: Respiratory Virus Detection Surveillance System
Navigate to the Dashboard maintained by the Public Health Agency of Canada: https://health-infobase.canada.ca/respiratory-virus-detections/. Let’s focus on Figure 4.
- When was it last updated?
- What is the reference date for the most recent data?
- What geographic regions are available for different data streams?
- Are the revisions tracked?
Source 2: Delphi Epi Portal
- Browse the table.
- What sorts of signals are available?
- For what regions?
- Select a signal with at least state-level geographic coverage that is “Ongoing” (For example, “Covid-Related Doctor Visits”)
- Are revisions tracked?
- What sort of latency does it have?
Extra credit
- Download this season’s RVDSS data. This is most easily done by following the instructions at https://github.com/dajmcdon/rvdss-canada. (My team scrapes the data weekly. Just use the
Rcode at the bottom of the README to “Read in data for a single season”) - Download the signal from the Delphi Epidata API that you chose above using
{epidatr}. - Explore and examine both signals using graphics or summary statistics.
Lecture 2: Data Cleaning, Versioning, and Nowcasting
- If you didn’t get a chance earlier, download this season’s RVDSS data.
- Filter to only those rows with
geo_type != "province". (This is a bit of a misnomer, we’re keeping some provinces and some regions.) - Convert it to an
epi_archivewithas_epi_archive(). - Examine the revision behaviour for one of the signals. The best way is with a plot. (Unfortunately,
revision_summary()won’t work for the moment.) - Is there much backfill? Latency?
- Use
epix_as_of_current()to get the most recent snapshot. - Let’s look at just
flu_pct_positive. Calculate the average correlation across geographies at lag 7 and lag 14. - Calculate the correlation between
flu_pct_positiveandsarscov2_pct_positiveover time and plot the result.
If there’s still time, try to calculate the growth rate for flu_pct_positive.
Lecture 3:
Here’s a modified version of the SIR simulation function that returns only the new infections:
sim_SIR <- function(TT, N = 1000, beta = .1, gamma = .01) {
S <- double(TT)
I <- double(TT)
R <- double(TT)
S[1] <- N - 1
I[1] <- 1
i <- double(TT)
i[1] <- 1
for (tt in 2:TT) {
contagions <- rbinom(1, size = S[tt - 1], prob = beta * I[tt - 1] / N)
removals <- rbinom(1, size = I[tt - 1], prob = gamma)
S[tt] <- S[tt - 1] - contagions
I[tt] <- I[tt - 1] + contagions - removals
R[tt] <- R[tt - 1] + removals
i[tt] <- contagions
}
tibble(infections = i, time = seq(TT))
}- Continuing with the RVDSS data (most recent snapshot), make a plot of
flu_positive_testsfor your favourite region. - Adjust the parameters
N,beta, andgammato calibrate an SIR model that fits the data closely. - Use
{rtestim}to estimate \(R_t\) for your favourite region.
Lecture 4: Forecasting and Advanced Topics
- Continuing with the RVDSS data (most recent snapshot), use
arx_forecaster()to produce forecasts 1, 2, and 3 weeks after of the most recent data forflu_pct_positive. - You can plot them as follows (for example):
h1 <- arx_forecaster(rvdss, ..., args_list = arx_args_list(ahead = 7))
autoplot(h1)- Adjust the arguments however you like. You can add other predictors, adjust lags, change forecasting engines, etc.
If you have time, try “Building a forecaster from scratch” this is more difficult to plot, but you can borrow the code from the lecture slides.