Title: | Cross-Validated (Post-) Lasso |
---|---|
Description: | Built on top of the 'glmnet' library by Friedman, Hastie and Tibshirani (2010) <doi:10.18637/jss.v033.i01>, the 'plasso' package follows Knaus (2022) <doi:10.1093/ectj/utac015> and comes up with two functions that estimate least squares Lasso and Post-Lasso models. The plasso() function adds coefficient paths for a Post-Lasso model to the standard 'glmnet' output. On top of that cv.plasso() cross-validates the coefficient paths for both the Lasso and Post-Lasso model and provides optimal hyperparameter values for the penalty term lambda. |
Authors: | Glaisner Stefan [aut, cre], Knaus Michael C. [ctb] |
Maintainer: | Glaisner Stefan <[email protected]> |
License: | GPL-3 |
Version: | 0.1.2 |
Built: | 2024-10-29 05:27:01 UTC |
Source: | https://github.com/rm-1997/plasso |
cv.plasso
objectExtract coefficients for both Lasso and Post-Lasso from a cv.plasso
object.
## S3 method for class 'cv.plasso' coef(object, ..., s = c("optimal", "all"), se_rule = 0)
## S3 method for class 'cv.plasso' coef(object, ..., s = c("optimal", "all"), se_rule = 0)
object |
|
... |
Pass generic |
s |
Determines whether coefficients are extracted for all values of lambda ("all") or only for the optimal lambda ("optimal") according to the specified standard error-rule. |
se_rule |
If equal to 0, predictions from cross-validated MSE minimum (default). Negative values go in the direction of smaller
models, positive values go in the direction of larger models (e.g. |
List object containing coefficients for both the Lasso and Post-Lasso models respectively.
lasso |
Sparse |
plasso |
Sparse |
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # get estimated coefficients along whole lambda sequence coefs = coef(p.cv, s="all") head(coefs$plasso) # get estimated coefficients for optimal lambda value according to 1-standard-error rule coef(p.cv, s="optimal", se_rule=-1)
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # get estimated coefficients along whole lambda sequence coefs = coef(p.cv, s="all") head(coefs$plasso) # get estimated coefficients for optimal lambda value according to 1-standard-error rule coef(p.cv, s="optimal", se_rule=-1)
plasso
objectExtract coefficients for both Lasso and Post-Lasso from a plasso
object.
## S3 method for class 'plasso' coef(object, ..., s = NULL)
## S3 method for class 'plasso' coef(object, ..., s = NULL)
object |
|
... |
Pass generic |
s |
If Null, coefficients are returned for all lambda values. If a value is provided, the closest lambda value of the |
List object containing coefficients that are associated with either all values along the lambda input sequence or for one specifically given lambda value for both the Lasso and Post-Lasso models respectively.
lasso |
Sparse |
plasso |
Sparse |
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # get estimated coefficients along whole lambda sequence coefs = coef(p) head(coefs$plasso) # get estimated coefficients for specific lambda approximation coef(p, s=0.05)
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # get estimated coefficients along whole lambda sequence coefs = coef(p) head(coefs$plasso) # get estimated coefficients for specific lambda approximation coef(p, s=0.05)
cv.plasso
uses the glmnet
package to estimate the coefficient paths and cross-validates least squares Lasso AND Post-Lasso.
cv.plasso(x, y, w = NULL, kf = 10, parallel = FALSE, ...)
cv.plasso(x, y, w = NULL, kf = 10, parallel = FALSE, ...)
x |
Matrix of covariates (number of observations times number of covariates matrix) |
y |
Vector of outcomes |
w |
Vector of weights |
kf |
Number of folds in k-fold cross-validation |
parallel |
Set as TRUE for parallelized cross-validation. Default is FALSE. |
... |
Pass |
cv.plasso object (using a list structure) including the base glmnet
object and cross-validation results (incl. optimal Lambda values) for both Lasso and Post-Lasso model.
call |
the call that produced this |
lasso_full |
base |
kf |
number of folds in k-fold cross-validation |
cv_MSE_lasso |
cross-validated MSEs of Lasso model (for every iteration of k-fold cross-validation) |
cv_MSE_plasso |
cross-validated MSEs of Post-Lasso model (for every iteration of k-fold cross-validation) |
mean_MSE_lasso |
averaged cross-validated MSEs of Lasso model |
mean_MSE_plasso |
averaged cross-validated MSEs of Post-Lasso model |
ind_min_l |
index of MSE optimal lambda value for Lasso model |
ind_min_pl |
index of MSE optimal lambda value for Post-Lasso model |
lambda_min_l |
MSE optimal lambda value for Lasso model |
lambda_min_pl |
MSE optimal lambda value for Post-Lasso model |
names_l |
Names of active variables for MSE optimal Lasso model |
names_pl |
Names of active variables for MSE optimal Post-Lasso model |
coef_min_l |
Coefficients for MSE optimal Lasso model |
coef_min_pl |
Coefficients for MSE optimal Post-Lasso model |
x |
Input matrix of covariates |
y |
Matrix of outcomes |
w |
Matrix of weights |
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # get basic summary statistics print(summary(p.cv, default=FALSE)) # plot cross-validated MSE curves and number of active coefficients plot(p.cv, legend_pos="bottomleft") # get coefficients at MSE optimal lambda value for both Lasso and Post-Lasso model coef(p.cv) # get coefficients at MSE optimal lambda value according to 1-standard-error rule coef(p.cv, se_rule=-1) # predict fitted values along whole lambda sequence pred = predict(p.cv, s="all") head(pred$plasso)
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # get basic summary statistics print(summary(p.cv, default=FALSE)) # plot cross-validated MSE curves and number of active coefficients plot(p.cv, legend_pos="bottomleft") # get coefficients at MSE optimal lambda value for both Lasso and Post-Lasso model coef(p.cv) # get coefficients at MSE optimal lambda value according to 1-standard-error rule coef(p.cv, se_rule=-1) # predict fitted values along whole lambda sequence pred = predict(p.cv, s="all") head(pred$plasso)
plasso
implicitly estimates a Lasso model using the glmnet
package
and additionally estimates coefficient paths for a subsequent Post-Lasso model.
plasso(x, y, w = NULL, ...)
plasso(x, y, w = NULL, ...)
x |
Matrix of covariates (number of observations times number of covariates matrix) |
y |
Vector of outcomes |
w |
Vector of weights |
... |
Pass |
List including base glmnet
(i.e. Lasso) object and Post-Lasso coefficients.
call |
the call that produced this |
lasso_full |
base |
beta_plasso |
matrix of coefficients for Post-Lasso model stored in sparse column format |
x |
Input matrix of covariates |
y |
Matrix of outcomes |
w |
Matrix of weights |
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # plot coefficient paths for Post-Lasso model plot(p, lasso=FALSE, xvar="lambda") # plot coefficient paths for Lasso model plot(p, lasso=TRUE, xvar="lambda") # get coefficients for specific lambda approximation coef(p, s=0.05) # predict fitted values along whole lambda sequence pred = predict(p) head(pred$plasso)
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # plot coefficient paths for Post-Lasso model plot(p, lasso=FALSE, xvar="lambda") # plot coefficient paths for Lasso model plot(p, lasso=TRUE, xvar="lambda") # get coefficients for specific lambda approximation coef(p, s=0.05) # predict fitted values along whole lambda sequence pred = predict(p) head(pred$plasso)
Plot of cross-validation curves.
## S3 method for class 'cv.plasso' plot( x, ..., legend_pos = c("bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center"), legend_size = 0.5, lasso = FALSE )
## S3 method for class 'cv.plasso' plot( x, ..., legend_pos = c("bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right", "center"), legend_size = 0.5, lasso = FALSE )
x |
|
... |
Pass generic |
legend_pos |
Legend position. Only considered for joint plot (lass=FALSE). |
legend_size |
Font size of legend |
lasso |
If set as True, only the cross-validation curve for the Lasso model is plotted. Default is False. |
Plots the cross-validation curves for both Lasso and Post-Lasso models (incl. upper and lower standard deviation curves)
for a fitted cv.plasso
object.
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # plot cross-validated MSE curves and number of active coefficients plot(p.cv, legend_pos="bottomleft")
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # plot cross-validated MSE curves and number of active coefficients plot(p.cv, legend_pos="bottomleft")
Plot coefficient paths of (Post-) Lasso model.
## S3 method for class 'plasso' plot(x, ..., lasso = FALSE, xvar = c("norm", "lambda", "dev"), label = FALSE)
## S3 method for class 'plasso' plot(x, ..., lasso = FALSE, xvar = c("norm", "lambda", "dev"), label = FALSE)
x |
|
... |
Pass generic |
lasso |
If set as True, coefficient paths for Lasso instead of Post-Lasso is plotted. Default is False. |
xvar |
X-axis variable:
|
label |
If TRUE, label the curves with variable sequence numbers |
Produces a coefficient profile plot of the coefficient paths for a fitted plasso
object.
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # plot coefficient paths for Post-Lasso model plot(p, lasso=FALSE, xvar="lambda") # plot coefficient paths for Lasso model plot(p, lasso=TRUE, xvar="lambda")
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # plot coefficient paths for Post-Lasso model plot(p, lasso=FALSE, xvar="lambda") # plot coefficient paths for Lasso model plot(p, lasso=TRUE, xvar="lambda")
Prediction for cross-validated (Post-) Lasso.
## S3 method for class 'cv.plasso' predict( object, ..., newx = NULL, type = c("response", "coefficients"), s = c("optimal", "all"), se_rule = 0 )
## S3 method for class 'cv.plasso' predict( object, ..., newx = NULL, type = c("response", "coefficients"), s = c("optimal", "all"), se_rule = 0 )
object |
Fitted |
... |
Pass generic |
newx |
Matrix of new values for x at which predictions are to be made. If no value is supplied, x from fitting procedure is used. This argument is not used for |
type |
Type of prediction required. |
s |
Determines whether prediction is done for all values of lambda ( |
se_rule |
If equal to 0, predictions from cross-validated MSE minimum (default). Negative values go in the direction of smaller
models, positive values go in the direction of larger models (e.g. |
List object containing either fitted values or coefficients for both the Lasso and Post-Lasso models respectively.
lasso |
Matrix with Lasso predictions or coefficients |
plasso |
Matrix with Post-Lasso predictions or coefficients |
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # predict fitted values along whole lambda sequence pred = predict(p.cv, s="all") head(pred$plasso) # predict fitted values for optimal lambda value (according to cross-validation) pred_optimal = predict(p.cv, s="optimal") head(pred_optimal$plasso) # predict fitted values for new feature set X X_new = head(X, 10) pred_new = predict(p.cv, newx=X_new, s="optimal") pred_new$plasso # get estimated coefficients along whole lambda sequence coefs = predict(p.cv, type="coefficients", s="all") head(coefs$plasso) # get estimated coefficients for optimal lambda value according to 1-standard-error rule predict(p.cv, type="coefficients", s="optimal", se_rule=-1)
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # predict fitted values along whole lambda sequence pred = predict(p.cv, s="all") head(pred$plasso) # predict fitted values for optimal lambda value (according to cross-validation) pred_optimal = predict(p.cv, s="optimal") head(pred_optimal$plasso) # predict fitted values for new feature set X X_new = head(X, 10) pred_new = predict(p.cv, newx=X_new, s="optimal") pred_new$plasso # get estimated coefficients along whole lambda sequence coefs = predict(p.cv, type="coefficients", s="all") head(coefs$plasso) # get estimated coefficients for optimal lambda value according to 1-standard-error rule predict(p.cv, type="coefficients", s="optimal", se_rule=-1)
Prediction for (Post-) Lasso models.
## S3 method for class 'plasso' predict( object, ..., newx = NULL, type = c("response", "coefficients"), s = NULL )
## S3 method for class 'plasso' predict( object, ..., newx = NULL, type = c("response", "coefficients"), s = NULL )
object |
Fitted |
... |
Pass generic |
newx |
Matrix of new values for x at which predictions are to be made. If no value is supplied, x from fitting procedure is used. This argument is not used for type="coefficients". |
type |
Type of prediction required. "response" returns fitted values, "coefficients" returns beta estimates. |
s |
If Null, prediction is done for all lambda values. If a value is provided, the closest lambda value of the |
List object containing either fitted values or coefficients for both the Lasso and Post-Lasso models associated with all values along the lambda input sequence or for one specifically given lambda value.
lasso |
Matrix with Lasso predictions or coefficients |
plasso |
Matrix with Post-Lasso predictions or coefficients |
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # predict fitted values along whole lambda sequence pred = predict(p) head(pred$plasso) # get estimated coefficients for specific lambda approximation predict(p, type="coefficients", s=0.05)
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit plasso to the data p = plasso::plasso(X,y) # predict fitted values along whole lambda sequence pred = predict(p) head(pred$plasso) # get estimated coefficients for specific lambda approximation predict(p, type="coefficients", s=0.05)
Printing main insights from cross-validated (Post-) Lasso model.
## S3 method for class 'cv.plasso' print(x, ..., digits = max(3, getOption("digits") - 3))
## S3 method for class 'cv.plasso' print(x, ..., digits = max(3, getOption("digits") - 3))
x |
|
... |
Pass generic |
digits |
Integer, used for number formatting |
Prints basic statistics for different lambda values of a fitted plasso
object,
i.e. cross-validated MSEs for both Lasso and Post-Lasso model as well as the number of active variables.
Printing main insights from (Post-) Lasso model.
## S3 method for class 'plasso' print(x, ..., digits = max(3, getOption("digits") - 3))
## S3 method for class 'plasso' print(x, ..., digits = max(3, getOption("digits") - 3))
x |
|
... |
Pass generic |
digits |
Integer, used for number formatting |
Prints glmnet
-like output.
Prints summary information of cv.plasso
object
## S3 method for class 'summary.cv.plasso' print(x, ..., digits = max(3L, getOption("digits") - 3L))
## S3 method for class 'summary.cv.plasso' print(x, ..., digits = max(3L, getOption("digits") - 3L))
x |
Summary of plasso object (either of class |
... |
Pass generic R |
digits |
Integer, used for number formatting |
Prints information from summary.cv.plasso
object into console.
Summary of cross-validated (Post-) Lasso model.
## S3 method for class 'cv.plasso' summary(object, ..., default = FALSE)
## S3 method for class 'cv.plasso' summary(object, ..., default = FALSE)
object |
|
... |
Pass generic |
default |
TRUE for |
For specific summary information: summary.cv.plasso object (using list structure) containing optimal
lambda values and associated MSEs for both cross-validated Lasso and Post-Lasso model.
For default: summaryDefault
object.
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # get informative summary statistics print(summary(p.cv, default=FALSE)) # set default=TRUE for standard summary statistics print(summary(p.cv, default=TRUE))
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y) # get informative summary statistics print(summary(p.cv, default=FALSE)) # set default=TRUE for standard summary statistics print(summary(p.cv, default=TRUE))
Summary of (Post-) Lasso model.
## S3 method for class 'plasso' summary(object, ...)
## S3 method for class 'plasso' summary(object, ...)
object |
|
... |
Pass generic |
Default summary
object
Simulated data from a DGP with an underlying causal relationship between
covariates X and the target y.
The covariates matrix X consists of 10 variables whose effect size on target
y is defined by the vector
c(1, -0.83, 0.67, -0.5, 0.33, -0.17, 0, ..., 0)
with the first six effect sizes decreasing in absolute terms continuously
from 1 to 0 and alternating in their sign.
The true causal effect of all other covariates is 0.
The variables in X follow a normal distribution with mean zero while the
covariance matrix follows a Toeplitz matrix.
The target y is then a linear transformation of X plus a vector of standard
normal random variables (i.e. error term).
(See vignette for more details.)
data(toeplitz)
data(toeplitz)
An object of class standardGeneric
of length 1.
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y)
# load toeplitz data data(toeplitz) # extract target and features from data y = as.matrix(toeplitz[,1]) X = toeplitz[,-1] # fit cv.plasso to the data p.cv = plasso::cv.plasso(X,y)