Skip to contents

This function takes an existing (previously developed) prediction model and applies various model updating methods to tailor/adapt it to a new dataset. Various levels of updating are possible, ranging from model re-calibration to model refit.

Usage

pred_update(
  x,
  update_type = c("intercept_update", "recalibration", "refit"),
  new_data,
  binary_outcome = NULL,
  survival_time = NULL,
  event_indicator = NULL
)

Arguments

x

an object of class "predinfo" produced by calling pred_input_info containing information on exactly one existing prediction model.

update_type

character variable specifying the level of updating that is required.

new_data

data.frame upon which the prediction models should be updated.

binary_outcome

Character variable giving the name of the column in new_data that represents the observed binary outcomes (should be coded 0 and 1 for non-event and event, respectively). Only relevant for model_type="logistic"; leave as NULL otherwise. Leave as NULL if new_data does not contain any outcomes.

survival_time

Character variable giving the name of the column in new_data that represents the observed survival times. Only relevant for x$model_type="survival"; leave as NULL otherwise.

event_indicator

Character variable giving the name of the column in new_data that represents the observed survival indicator (1 for event, 0 for censoring). Only relevant for x$model_type="survival"; leave as NULL otherwise.

Value

A object of class "predUpdate". This is the same as that detailed in pred_input_info, with the added element containing the estimates of the model updating and the update type.

Details

This function takes a single existing (previously estimated) prediction model, and apply various model discrete model updating methods (see Su et al. 2018) to tailor the model to a new dataset.

The type of updating method is selected with the update_type parameter, with options: "intercept_update", "recalibration" and "refit". "intercept_update" corrects the overall calibration-in-the-large of the model, through altering the model intercept (or baseline hazard) to suit the new dataset. This is achieved by fitting a logistic model (if the existing model is of type logistic) or time-to-event model (if the existing model if of type survival) to the new dataset, with the linear predictor as the only covariate, with the coefficient fixed at unity (i.e. as an offset). "recalibration" corrects the calibration-in-the-large and any under/over-fitting, by fitting a logistic model (if the existing model is of type logistic) or time-to-event model (if the existing model if of type survival) to the new dataset, with the linear predictor as the only covariate. Finally, "refit" takes the original model structure and re-estimates all coefficients; this has the effect as re-developing the original model in the new data.

new_data should be a data.frame, where each row should be an observation (e.g. patient) and each variable/column should be a predictor variable. The predictor variables need to include (as a minimum) all of the predictor variables that are included in the existing prediction model (i.e., each of the variable names supplied to pred_input_info, through the model_info parameter, must match the name of a variables in new_data).

Any factor variables within new_data must be converted to dummy (0/1) variables before calling this function. dummy_vars can help with this. See pred_predict for examples.

binary_outcome, survival_time and event_indicator are used to specify the outcome variable(s) within new_data (use binary_outcome if x$model_type = "logistic", or use survival_time and event_indicator if x$model_type = "survival").

References

Su TL, Jaki T, Hickey GL, Buchan I, Sperrin M. A review of statistical updating methods for clinical prediction models. Stat Methods Med Res. 2018 Jan;27(1):185-197. doi: 10.1177/0962280215626466.

See also

Examples

#Example 1 - update time-to-event model by updating the baseline hazard in new dataset
model1 <- pred_input_info(model_type = "survival",
                          model_info = SYNPM$Existing_TTE_models[1,],
                          cum_hazard = SYNPM$TTE_mod1_baseline)
recalibrated_model1 <- pred_update(x = model1,
                                   update_type = "intercept_update",
                                   new_data = SYNPM$ValidationData,
                                   survival_time = "ETime",
                                   event_indicator = "Status")
summary(recalibrated_model1)
#> Original model was updated with type intercept_update
#> The new model baseline cumulative hazard is: 
#>           time       hazard
#> 1 2.021278e-06 1.498399e-05
#> 2 1.630775e-05 2.996905e-05
#> 3 3.600450e-05 4.495490e-05
#> 4 4.006704e-05 5.994284e-05
#> 5 6.484743e-05 7.493149e-05
#> 6 1.216613e-04 8.992081e-05
#> ...
#> 
#> Updated Model Coefficients 
#> ================================= 
#>           Age      SexM Smoking_Status  Diabetes Creatinine
#> 1 0.007014587 0.2249174      0.6852695 0.4245074   0.587486
#> 
#> Model Functional Form 
#> ================================= 
#> Age + SexM + Smoking_Status + Diabetes + Creatinine