This function takes an existing (previously developed) prediction model and applies various model updating methods to tailor/adapt it to a new dataset. Various levels of updating are possible, ranging from model re-calibration to model refit.
Usage
pred_update(
x,
update_type = c("intercept_update", "recalibration", "refit"),
new_data,
binary_outcome = NULL,
survival_time = NULL,
event_indicator = NULL
)
Arguments
- x
an object of class "
predinfo
" produced by callingpred_input_info
containing information on exactly one existing prediction model.- update_type
character variable specifying the level of updating that is required.
- new_data
data.frame upon which the prediction models should be updated.
- binary_outcome
Character variable giving the name of the column in
new_data
that represents the observed binary outcomes (should be coded 0 and 1 for non-event and event, respectively). Only relevant formodel_type
="logistic"; leave asNULL
otherwise. Leave asNULL
ifnew_data
does not contain any outcomes.- survival_time
Character variable giving the name of the column in
new_data
that represents the observed survival times. Only relevant forx$model_type
="survival"; leave asNULL
otherwise.- event_indicator
Character variable giving the name of the column in
new_data
that represents the observed survival indicator (1 for event, 0 for censoring). Only relevant forx$model_type
="survival"; leave asNULL
otherwise.
Value
A object of class "predUpdate
". This is the same as that
detailed in pred_input_info
, with the added element
containing the estimates of the model updating and the update type.
Details
This function takes a single existing (previously estimated) prediction model, and apply various model discrete model updating methods (see Su et al. 2018) to tailor the model to a new dataset.
The type of updating method is selected with the update_type
parameter, with options: "intercept_update", "recalibration" and "refit".
"intercept_update" corrects the overall calibration-in-the-large of the
model, through altering the model intercept (or baseline hazard) to suit
the new dataset. This is achieved by fitting a logistic model (if the
existing model is of type logistic) or time-to-event model (if the existing
model if of type survival) to the new dataset, with the linear predictor as
the only covariate, with the coefficient fixed at unity (i.e. as an
offset). "recalibration" corrects the calibration-in-the-large and any
under/over-fitting, by fitting a logistic model (if the existing model is
of type logistic) or time-to-event model (if the existing model if of type
survival) to the new dataset, with the linear predictor as the only
covariate. Finally, "refit" takes the original model structure and
re-estimates all coefficients; this has the effect as re-developing the
original model in the new data.
new_data
should be a data.frame, where each row should be an
observation (e.g. patient) and each variable/column should be a predictor
variable. The predictor variables need to include (as a minimum) all of the
predictor variables that are included in the existing prediction model
(i.e., each of the variable names supplied to
pred_input_info
, through the model_info
parameter,
must match the name of a variables in new_data
).
Any factor variables within new_data
must be converted to dummy
(0/1) variables before calling this function. dummy_vars
can
help with this. See pred_predict
for examples.
binary_outcome
, survival_time
and event_indicator
are
used to specify the outcome variable(s) within new_data
(use
binary_outcome
if x$model_type
= "logistic", or use
survival_time
and event_indicator
if x$model_type
=
"survival").
References
Su TL, Jaki T, Hickey GL, Buchan I, Sperrin M. A review of statistical updating methods for clinical prediction models. Stat Methods Med Res. 2018 Jan;27(1):185-197. doi: 10.1177/0962280215626466.
Examples
#Example 1 - update time-to-event model by updating the baseline hazard in new dataset
model1 <- pred_input_info(model_type = "survival",
model_info = SYNPM$Existing_TTE_models[1,],
cum_hazard = SYNPM$TTE_mod1_baseline)
recalibrated_model1 <- pred_update(x = model1,
update_type = "intercept_update",
new_data = SYNPM$ValidationData,
survival_time = "ETime",
event_indicator = "Status")
summary(recalibrated_model1)
#> Original model was updated with type intercept_update
#> The new model baseline cumulative hazard is:
#> time hazard
#> 1 2.021278e-06 1.498399e-05
#> 2 1.630775e-05 2.996905e-05
#> 3 3.600450e-05 4.495490e-05
#> 4 4.006704e-05 5.994284e-05
#> 5 6.484743e-05 7.493149e-05
#> 6 1.216613e-04 8.992081e-05
#> ...
#>
#> Updated Model Coefficients
#> =================================
#> Age SexM Smoking_Status Diabetes Creatinine
#> 1 0.007014587 0.2249174 0.6852695 0.4245074 0.587486
#>
#> Model Functional Form
#> =================================
#> Age + SexM + Smoking_Status + Diabetes + Creatinine