Research

model selection

My research sits at the intersection of statistical theory and computer science methodology and is part of the modern ascendancy of mining “big data” to produce fundamentally novel science from complicated datasets. Specifically, I seek to illuminate the role played by the nature and quantity of regularization as a tool for improved scientific understanding.

Through this lens, my research can be divided into four intersecting areas: (1) computational approximation methodology, (2) model selection, (3) high-dimensional and nonparametric theory, and (4) applications related to these. My work explores and exploits the connections between these areas rather than approaching them separately—my contributions have been developed out of the pressing need to justify methodology as implemented in applications rather than in a vacuum devoid of empirical motivation. My research program seeks to generate statistical guarantees for the procedures that applied researchers use while also developing methodology for complicated, high-dimensional problems. Within this context, much of my work involves what is referred to as regularization—the process of mathematically balancing complex but meaningful scientific models with a preference for simple fundamental structures.

My work has been supported by grants from the Institute for New Economic Thinking, the National Science Foundation, the Canadian Statistical Sciences Institute, and the National Sciences and Engineering Research Council. I was the recipient of an NSF CAREER award in 2018.

Publications, submitted manuscripts, and technical reports

  1. X. Liang, A. Cohen, A. S. Heinsfeld, F. Pestilli, and D. J. McDonald, “sparsegl: An R Package for Estimating Sparse Group Lasso,” arXiv, 2022.
  2. E. Y. Cramer et al., “Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States,” Proceedings of the National Academy of Sciences, vol. 119, no. 15, p. e2113561119, 2022.
  3. E. Tuzhilina, T. J. Hastie, D. J. McDonald, J. K. Tay, and R. Tibshirani, “Smooth multi-period forecasting with application to prediction of COVID-19 cases,” arXiv, 2022.
  4. L. Ding, G. E. Zentner, and D. J. McDonald, “Sufficient principal component regression for genomics,” Bioinformatics Advances, vol. 2, p. vbac033, 2022.
  5. V. Sadhanala, R. Bassett, J. and Sharpnack, and D. J. McDonald, “Exponential family trend filtering on lattices,” arXiv, 2022.
  6. D. Pham, D. J. McDonald, L. Ding, M. B. Nebel, and A. Mejia, “Less is more: balancing noise reduction and data retention in fMRI with projection scrubbing,” arXiv, 2022.
  7. D. J. McDonald et al., “Can Auxiliary Indicators Improve COVID-19 Forecasting and Hotspot Prediction?,” Proceedings of the National Academy of Sciences, vol. 118, no. 51, p. e2111453118, 2021.
  8. D. J. McDonald, M. McBride, Y. Gu, and C. Raphael, “Markov-switching State Space Models for Uncovering Musical Interpretation,” Annals of Applied Statistics, vol. 15, no. 3, pp. 1147–1170, 2021.
  9. A. Reinhart et al., “An Open Repository of Real-Time COVID-19 Indicators,” Proceedings of the National Academy of Sciences, vol. 118, no. 51, p. e2111452118, 2021.
  10. R. A. Policastro, D. J. McDonald, V. P. Brendel, and G. E. Zentner, “Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR,” NAR Genomics and Bioinformatics, vol. 3, no. 2, pp. 1–10, 2021.
  11. D. Homrighausen and D. J. McDonald, “Compressed and penalized linear regression,” Journal of Computational and Graphical Statistics, vol. 29, no. 2, pp. 309–322, 2020.
  12. D. J. McDonald, “Book Review: Sufficient Dimension Reduction: Methods and Applications with R,” Journal of the American Statistical Association, vol. 115, 2020.
  13. A. Khodadadi and D. J. McDonald, “Algorithms for Estimating Trends in Global Temperature Volatility,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), 2019.
  14. D. Homrighausen and D. J. McDonald, “A study on tuning parameter selection for the high-dimensional lasso,” Journal of Statistical Computation and Simulation, vol. 88, pp. 2865–2892, 2018.
  15. D. J. McDonald, “Minimax Density Estimation for Growing Dimension,” in Proceedings of the Twentieth International Conference on Artificial Intelligence and Statistics (AISTATS), 2017, vol. 54, pp. 194–203.
  16. D. J. McDonald and C. R. Shalizi, “Rademacher complexity of stationary sequences,” arXiv, 2017.
  17. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Nonparametric risk bounds for time-series forecasting,” Journal of Machine Learning Research, vol. 18, no. 32, pp. 1–40, 2017.
  18. L. Ding and D. J. McDonald, “Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression,” Bioinformatics, vol. 33, no. 14, pp. i350–i358, 2017.
  19. D. Homrighausen and D. J. McDonald, “Risk consistency of cross-validation for lasso-type procedures,” Statistica Sinica, vol. 27, no. 3, pp. 1017–1036, 2017.
  20. D. Homrighausen and D. J. McDonald, “On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets,” Journal of Computational and Graphical Statistics, vol. 25, no. 2, pp. 344–362, 2016.
  21. G. Loewenstein, T. Krishnamurti, J. Kopsic, and D. J. McDonald, “Does Increased Sexual Frequency Enhance Happiness?,” Journal of Economic Behavior and Organization, vol. 116, pp. 206–218, 2015.
  22. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Estimating beta-mixing coefficients via histograms,” Electronic Journal of Statistics, vol. 9, pp. 2855–2883, 2015.
  23. D. Homrighausen and D. J. McDonald, “Leave-one-out cross-validation is risk consistent for lasso,” Machine Learning, vol. 97, no. 1–2, pp. 65–78, 2014.
  24. D. Homrighausen and D. J. McDonald, “The lasso, persistence, and cross-validation,” in Proceedings of the Thirtieth International Conference on Machine Learning (ICML), 2013, vol. 28, pp. 1031–1039.
  25. D. J. McDonald, “Generalization error bounds for state-space models,” PhD thesis, Carnegie Mellon University, 2012.
  26. J. J. S. Jue, M. J. Press, D. J. McDonald, K. G. Volpp, D. A. Asch, N. Mitra, A. C. Stanowski, and G. Loewenstein, “The impact of price discounts and calorie messaging on beverage consumption: A multi-site field study,” Preventive Medicine, vol. 55, pp. 629–633, 2012.
  27. D. Homrighausen and D. J. McDonald, “Spectral approximations in machine learning,” arXiv, 2011.
  28. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Estimated VC dimension for risk bounds,” arXiv, 2011.
  29. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Estimating beta-mixing coefficients,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2011, vol. 15, pp. 516–524.
  30. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Generalization error bounds for stationary autoregressive models,” arXiv, 2011.
  31. D. J. McDonald, G. F. Loewenstein, and J. Kadane, “The behavior of weight-loss study participants in response to incentives,” arXiv, 2009.
  32. D. J. McDonald and D. L. Thornton, “Primer on the Mortgage Market and Mortgage Finance,” The Federal Reserve Bank of St. Louis Review, vol. 90, no. 1, pp. 31–46, 2008.

Slides for invited talks and posters

  • Your model is beautiful, but does it predict? (slides)
  • COVID-19 Modelling and Forecasting in the US and Canada: A statisticians perspective (slides)
  • Markov switching state space models for uncovering musical interpretation (slides v1) (slides v2)
  • Trend filtering in exponential families (slides v1) (slides v2)
  • Algorithms for Estimating Trends in Global Temperature Volatility (poster)
  • Regularization, optimization, and approximation: The benefits of a convex combination (slides)
  • Matrix sketching for alternating direction method of multipliers optimization (slides)
  • Statistical implications of (some) computational approximations (slides)
  • A Switching Kalman Filter for Modeling Classical Music Performances (poster)
  • Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression (slides)
  • Compressed and penalized linear regression (slides)
  • Estimating beta mixing coefficients with histograms (slides)
  • Approximation-regularization for analysis of large data sets (slides v1) (slides v2)
  • Risk estimation for high-dimensional lasso regression (slides)
  • Approximate principal components analysis of large data sets (slides v1) (slides v2)
  • Statistical machine learning with structured data (slides)
  • Clustering classical music performance (slides) (poster)
  • The lasso, persistence, and cross-validation (slides) (poster)
  • Nonparametric risk bounds for time series prediction (slides v1) (slides v2) (slides v3)
  • Estimating beta mixing coefficients (poster)
  • Spectral approximation methods: performance evaluations in clustering and classification (slides)
  • Generalization error bounds for state-space models: with an application to economic forecasting (slides)

Daniel J. McDonald © 2022. All rights reserved.

Powered by Hydejack v9.1.6