Research

model selection

My research sits at the intersection of statistical theory and computer science methodology and is part of the modern ascendancy of mining “big data” to produce fundamentally novel science from complicated datasets. Specifically, I seek to illuminate the role played by the nature and quantity of regularization as a tool for improved scientific understanding.

Through this lens, my research can be divided into four intersecting areas: (1) computational approximation methodology, (2) model selection, (3) high-dimensional and nonparametric theory, and (4) applications related to these. My work explores and exploits the connections between these areas rather than approaching them separately—my contributions have been developed out of the pressing need to justify methodology as implemented in applications rather than in a vacuum devoid of empirical motivation. My research program seeks to generate statistical guarantees for the procedures that applied researchers use while also developing methodology for complicated, high-dimensional problems. Within this context, much of my work involves what is referred to as regularization—the process of mathematically balancing complex but meaningful scientific models with a preference for simple fundamental structures.

My work has been supported by grants from the Institute for New Economic Thinking and the National Science Foundation. I was the recipient of an NSF CAREER award in 2018.

Contents

  1. Publications
  2. Working papers
  3. Slides for talks and presentations
  4. Dissertation work

Publications and technical reports

  1. D. Homrighausen and D. J. McDonald, “Compressed and penalized linear regression,” Journal of Computational and Graphical Statistics, vol. forthcoming, 2019+.
  2. A. Khodadadi and D. J. McDonald, “Algorithms for Estimating Trends in Global Temperature Volatility,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), 2019.
  3. D. Homrighausen and D. J. McDonald, “A study on tuning parameter selection for the high-dimensional lasso,” Journal of Statistical Computation and Simulation, vol. 88, pp. 2865–2892, 2018.
  4. D. Homrighausen and D. J. McDonald, “Risk consistency of cross-validation for lasso-type procedures,” Statistica Sinica, vol. 27, no. 3, pp. 1017–1036, 2017.
  5. D. J. McDonald, “Minimax Density Estimation for Growing Dimension,” in Proceedings of the Twentieth International Conference on Artificial Intelligence and Statistics (AISTATS), 2017, vol. 54, pp. 194–203.
  6. L. Ding and D. J. McDonald, “Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression,” Bioinformatics, vol. 33, no. 14, pp. i350–i358, 2017.
  7. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Nonparametric risk bounds for time-series forecasting,” Journal of Machine Learning Research, vol. 18, no. 32, pp. 1–40, 2017.
  8. D. Homrighausen and D. J. McDonald, “On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets,” Journal of Computational and Graphical Statistics, vol. 25, no. 2, pp. 344–362, 2016.
  9. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Estimating beta-mixing coefficients via histograms,” Electronic Journal of Statistics, vol. 9, pp. 2855–2883, 2015.
  10. G. Loewenstein, T. Krishnamurti, J. Kopsic, and D. J. McDonald, “Does Increased Sexual Frequency Enhance Happiness?,” Journal of Economic Behavior and Organization, vol. 116, pp. 206–218, 2015.
  11. D. Homrighausen and D. J. McDonald, “Leave-one-out cross-validation is risk consistent for lasso,” Machine Learning, vol. 97, no. 1-2, pp. 65–78, 2014.
  12. D. Homrighausen and D. J. McDonald, “The lasso, persistence, and cross-validation,” in Proceedings of the Thirtieth International Conference on Machine Learning (ICML), 2013, vol. 28, pp. 1031–1039.
  13. D. J. McDonald, “Generalization error bounds for state-space models,” PhD thesis, Carnegie Mellon University, 2012.
  14. J. J. S. Jue, M. J. Press, D. J. McDonald, K. G. Volpp, D. A. Asch, N. Mitra, A. C. Stanowski, and G. Loewenstein, “The impact of price discounts and calorie messaging on beverage consumption: A multi-site field study,” Preventive Medicine, vol. 55, pp. 629–633, 2012.
  15. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Estimated VC dimension for risk bounds,” 2011.
  16. D. Homrighausen and D. J. McDonald, “Spectral approximations in machine learning,” 2011.
  17. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Estimating beta-mixing coefficients,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2011, vol. 15, pp. 516–524.
  18. D. J. McDonald, C. R. Shalizi, and M. Schervish, “Generalization error bounds for stationary autoregressive models,” 2011.
  19. D. J. McDonald, G. F. Loewenstein, and J. Kadane, “The behavior of weight-loss study participants in response to incentives,” 2009.
  20. D. J. McDonald and D. L. Thornton, “Primer on the Mortgage Market and Mortgage Finance,” The Federal Reserve Bank of St. Louis Review, vol. 90, no. 1, pp. 31–46, 2008.

Working papers

  1. L. Ding and D. J. McDonald, “Sufficient principal component regression for genomics.” submitted, 2019+.
  2. D. J. McDonald, J. Sharpnack, and R. Bassett, “Exponential family trend filtering on grids.” in preparation, 2019+.
  3. D. J. McDonald, M. McBride, Y. Gu, and C. Raphael, “Markov-switching State Space Models for Uncovering Musical Interpretation.” submitted, 2019+.
  4. D. J. McDonald and G. Loewenstein, “Factor Analysis for Panel Data.” in preparation, 2019+.
  5. D. J. McDonald, “Sparse additive state-space models.” in preparation, 2019+.
  6. D. J. McDonald and C. R. Shalizi, “Empirical Macroeconomics and DSGE Modeling in Statistical Perspective.” in preparation, 2019+.
  7. D. J. McDonald and C. R. Shalizi, “Rademacher complexity of stationary sequences.” submitted, 2019+.

Slides for invited talks and posters

  • Markov switching state space models for uncovering musical interpretation (slides)
  • Trend flitering in exponential families (slides)
  • Algorithms for Estimating Trends in Global Temperature Volatility (poster)
  • Regularization, optimization, and approximation: The benefits of a convex combination (slides)
  • Matrix sketching for alternating direction method of multipliers optimization (slides)
  • Statistical implications of (some) computational approximations (slides)
  • A Switching Kalman Filter for Modeling Classical Music Performances (poster)
  • Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression (slides)
  • Compressed and penalized linear regression (slides)
  • Estimating beta mixing coefficients with histograms (slides)
  • Approximation-regularization for analysis of large data sets (slides v1) (slides v2)
  • Risk estimation for high-dimensional lasso regression (slides)
  • Approximate principal components analysis of large data sets (slides v1) (slides v2)
  • Statistical machine learning with structured data (slides)
  • Clustering classical music performance (slides) (poster)
  • The lasso, persistence, and cross-validation (slides) (poster)
  • Nonparametric risk bounds for time series prediction (slides v1) (slides v2) (slides v3)
  • Estimating beta mixing coefficients (poster)
  • Spectral approximation methods: performance evaluations in clustering and classification (slides)
  • Generalization error bounds for state-space models: with an application to economic forecasting (slides)

Dissertation work


Daniel J. McDonald © 2019. All rights reserved.

Powered by Hydejack v8.5.1