Stat 548 - Qualifying Papers

Last updated: 12 March 2025

Choosing a paper

At the end of this document is a list of papers and project ideas that I am interested in supervising as Qualifying Papers (QPs). I am happy to discuss any other paper that you are interested in and think might be appropriate. I am generally interested in theoretical and methodological aspects of statistics and machine learning, especially those that relate to regularization, optimization, model selection, and time series forecasting.

Expectations

If you are interested in doing a QP with me, the first step is to email me to schedule a one-on-one meeting. Please use the words “Qualifying Paper” in the subject line of your email. At our first meeting, please be prepared to discuss:

Your background.
Your long-term research interests (it’s okay if these are not yet well-defined).
Why you are interested in the particular paper/project.
When you will submit your report (typically about four-six weeks after we meet).
The details of the QP project and report.
Any concerns you may have.

Report

The report should have the following structure:¹

Summary (~3 pages): The first section of the report should provide a summary of the paper and the problem(s) it addresses, including its relationship to any previous work, its major contributions (e.g., novel techniques, algorithmic developments, problem formulations, theoretical contributions), and any limitations or shortcomings (e.g., restrictive assumptions, computational constraints, flawed methodology). The aim of this section is for you to synthesize the findings of a body of work and clearly present the important points.
Mini-proposals for research projects: Each proposal should describe a research project that applies, extends, generalizes, adapts, or addresses shortcomings of the QP. Seemingly unrelated ideas inspired by the original QP are also fine. You may write more than one proposal, but you must write at least one. A proposal should concisely describe: the primary problem to be addressed; an approach (or multiple approaches) for addressing the problem; any technical or conceptual sub-problems; the potential impact of the project. You are not expected to pursue any of these projects (though we can talk more if you would like to). The aim of this section is to get you thinking creatively about research, and to begin developing the skills necessary for writing research proposals. Each proposal should be no more than 2 pages max.
QP specific project results: Each potential QP listed below has a brief description of a related project. We will discuss the project in detail in our initial meeting, and we can meet again (as many times as necessary) before the report due date. Your grade will not be affected by how good the results look, whether your approach improves on past work, or whether you achieve the initial goal of the project. I will use this project to evaluate your research potential, which includes (among other aspects):
- clearly formulating a research question;
- setting up a useful mathematical framework for the problem;
- thinking creatively and independently to develop a solution;
- relating the problem to existing work, in other fields if necessary;
- being resourceful and asking questions when necessary;
- learning from and moving past the inevitable setbacks;
- reformulating the research problem when necessary;
- implementing new methods in code (when applicable);
- choosing appropriate experiments and metrics;
- communicating and reflecting on progress, setbacks, and results;
- thinking of future research directions.

The report should be submitted as a GitHub repository based on the template here. The template includes a LaTeX style file that should be used for the report. (Detailed instructions for usage can be found in the repository’s README file.) Any experimental/numerical results should be reproducible. All code should be reusable, clearly commented/documented, and exist in the src/ folder of the same GitHub repository to which you give me access as a collaborator. Code can be in any language you wish, though my strong preference is for R or python.

Resources

Some resources on technical/mathematical writing:

Nancy Heckman’s page on writing
Harry Joe’s advice and writing resources for 548
Trevor Campbell’s How to Explain Things talk
Knuth, Larrabee, and Roberts on mathematical writing
Jenny Bryan’s Happy Git with R
Getting started with Git: chapters 1 and 2 should be all you need for this report

Available papers

~~Heng, Zhou, and Chi (2023). Bayesian Trend Filtering via Proximal Markov Chain Monte Carlo~~
Themes: Non-parametric regression, computation.
Project: Compare the methods with the {trendfilter} package in terms of speed/accuracy. Describe any advantages or disadvantages.
~~Bergmeir, Hyndman, Koo (2017). A note on the validity of cross-validation for evaluating autoregressive time series prediction~~
Themes: Time series, model selection, risk estimation.
Project: Mimic a similar experiment using AR data with lasso and CV vs OOS estimation. Examine the proof of Theorem 1 and describe how we might be able to extend these results to “mixing” processes (or why we can’t).
Anderson (2001). An ensemble adjustment Kalman filter for data assimilation
Themes: Time series, epidemiology.
Project: Read this paper along with Shaman and Karspeck (2012), Pei, Cane, and Shaman (2019), or any additional follow-up work you find relevant. Describe carefully how the EAKF is used for disease models.
Parag, Thompson, Donnelly (2022). Are Epidemic Growth Rates More Informative than Reproduction Numbers?
Themes: Time series, epidemiology.
Project: Discuss how one might modify {rtestim} to estimate growth rates.

Footnotes

Thanks to Trevor and Ben. I’m stealing most of this from them.↩︎