Skip to contents

This function takes a set of existing prediction models, and uses the new dataset to combine/aggregate them into a single 'meta-model', as described in Debray et al. 2014.

Usage

pred_stacked_regression(
  x,
  positivity_constraint = FALSE,
  new_data,
  binary_outcome = NULL,
  survival_time = NULL,
  event_indicator = NULL
)

Arguments

x

an object of class "predinfo" produced by calling pred_input_info containing information on at least two existing prediction models.

positivity_constraint

TRUE/FALSE denoting if the weights within the stacked regression model should be constrained to be non-negative (TRUE) or should be allowed to take any value (FALSE). See details.

new_data

data.frame upon which the prediction models should be aggregated.

binary_outcome

Character variable giving the name of the column in new_data that represents the observed binary outcomes (should be coded 0 and 1 for non-event and event, respectively). Only relevant for model_type="logistic"; leave as NULL otherwise. Leave as NULL if new_data does not contain any outcomes.

survival_time

Character variable giving the name of the column in new_data that represents the observed survival times. Only relevant for x$model_type="survival"; leave as NULL otherwise.

event_indicator

Character variable giving the name of the column in new_data that represents the observed survival indicator (1 for event, 0 for censoring). Only relevant for x$model_type="survival"; leave as NULL otherwise.

Value

A object of class "predSR". This is the same as that detailed in pred_input_info, with the added element containing the estimates of the meta-model obtained by stacked regression.

Details

This function takes a set of (previously estimated) prediction models that were each originally developed for the same prediction task, and pool/aggregate these into a single prediction model (meta-model) using stacked regression based on new data (data not used to develop any of the existing models). The methodological details can be found in Debray et al. 2014.

Given that the existing models are likely to be highly co-linear (since they were each developed for the same prediction task), it has been suggested to impose a positivity constraint on the weights of the stacked regression model (Debray et al. 2014.). If positivity_constraint is set to TRUE, then the stacked regression model will be estimated by optimising the (log-)likelihood using bound constrained optimization ("L-BFGS-B"). This is currently only implemented for logistic regression models (i.e., if x$model_type="logistic"). For survival models, positivity_constraint = FALSE.

new_data should be a data.frame, where each row should be an observation (e.g. patient) and each variable/column should be a predictor variable. The predictor variables need to include (as a minimum) all of the predictor variables that are included in the existing prediction models (i.e., each of the variable names supplied to pred_input_info, through the model_info parameter, must match the name of a variables in new_data).

Any factor variables within new_data must be converted to dummy (0/1) variables before calling this function. dummy_vars can help with this. See pred_predict for examples.

binary_outcome, survival_time and event_indicator are used to specify the outcome variable(s) within new_data (use binary_outcome if x$model_type = "logistic", or use survival_time and event_indicator if x$model_type = "survival").

References

Debray, T.P., Koffijberg, H., Nieboer, D., Vergouwe, Y., Steyerberg, E.W. and Moons, K.G. (2014), Meta-analysis and aggregation of multiple published prediction models. Statistics in Medicine, 33: 2341-2362

See also

Examples

LogisticModels <- pred_input_info(model_type = "logistic",
                                  model_info = SYNPM$Existing_logistic_models)
SR <- pred_stacked_regression(x = LogisticModels,
                              new_data = SYNPM$ValidationData,
                              binary_outcome = "Y")
summary(SR)
#> Existing models aggregated using stacked regression
#> The model stacked regression weights are as follows: 
#> (Intercept)         LP1         LP2         LP3 
#>  0.02781941  0.46448799  0.15626108  0.16282116 
#> 
#> Updated Model Coefficients 
#> ================================= 
#>   Intercept         Age      SexM Smoking_Status  Diabetes Creatinine
#> 1 -2.675134 0.005345728 0.1589948      0.5233706 0.2543348  0.4554044
#> 
#> Model Functional Form 
#> ================================= 
#> Age + SexM + Smoking_Status + Diabetes + Creatinine

#Survival model example:
TTModels <- pred_input_info(model_type = "survival",
                            model_info = SYNPM$Existing_TTE_models,
                            cum_hazard = list(SYNPM$TTE_mod1_baseline,
                                                  SYNPM$TTE_mod2_baseline,
                                                  SYNPM$TTE_mod3_baseline))
SR <- pred_stacked_regression(x = TTModels,
                              new_data = SYNPM$ValidationData,
                              survival_time = "ETime",
                              event_indicator = "Status")
summary(SR)
#> Existing models aggregated using stacked regression
#> The model stacked regression weights are as follows: 
#>        LP1        LP2        LP3 
#> -0.2707658  1.8832932 -0.4488339 
#> 
#> The new model baseline cumulative hazard is: 
#>           time       hazard
#> 1 2.021278e-06 5.338425e-06
#> 2 1.630775e-05 1.067721e-05
#> 3 3.600450e-05 1.601631e-05
#> 4 4.006704e-05 2.135604e-05
#> 5 6.484743e-05 2.669604e-05
#> 6 1.216613e-04 3.203626e-05
#> ...
#> 
#> Updated Model Coefficients 
#> ================================= 
#>          Age      SexM Smoking_Status  Diabetes Creatinine
#> 1 0.03363821 0.2725367      0.5354202 0.1595384  0.3142822
#> 
#> Model Functional Form 
#> ================================= 
#> Age + SexM + Smoking_Status + Diabetes + Creatinine