Sufficient principal component regression

Estimates principal component regression (or classification) under the assumption that the loadings are row-sparse. If this assumption is valid, then the resulting predictors are guaranteed to be irrelevant for predicting a response variable. This results in a sparse model with sparse linear combinations of features used for the final prediction

Usage

suffpcr(
  X,
  Y,
  family = c("gaussian", "binomial"),
  d = 3,
  n_lambda = 10,
  maxnvar = ncol(X),
  lambda = NULL,
  lambda_max = NULL,
  lambda_min = NULL,
  lambda_seq = c("loglinear", "linear"),
  screening = TRUE
)

Source

See Github vqv/fps for the original (non-approximate) implementation of fps upon which ours is based along with the paper Fantope Projection and Selection (NeurIPS 2013).

Arguments

X: n by p matrix of features
Y: length n response
family: optional family argument to implement regression ("gaussian", the default) or classification ("binomial")
d: target PC dimension
n_lambda: number of different lambda solutions to examine
maxnvar: optional limit on the number of variables to consider
lambda: optional vector of lambda values to use in the penalty
lambda_max: optional largest value of lambda
lambda_min: optional smallest value of lambda, must be non-negative and less than lambda_max
lambda_seq: should lambda be constructed on a loglinear or linear scale between the minimum and maximum
screening: do we screen as in algorithm 2

Value

an object of class "suffPCR" containing estimated coefficients, lambda values, the norms of Vhat and a number of additional components. Both predict() and coef() methods are available for accessing.

Examples

n <- 100
p <- 50
U <- rnorm(n)
V <- c(rnorm(5), rep(0, p - 5))
V <- V / sqrt(sum(V^2))
bstar <- V * (5 / (5.1))
X <- 5 * tcrossprod(U, V) + 0.1 * matrix(rnorm(n * p), n)
y <- U + rnorm(n)
out <- suffpcr(X, y, d = 1:3)