Input information about an existing prediction model — pred_input

Input coefficient information about one or multiple existing prediction model(s), for use in other functions in the package.

Usage

pred_input_info(
  model_type = c("logistic", "survival"),
  model_info,
  cum_hazard = NULL
)

Arguments

model_type

specifies the type of model that the existing prediction model is based on; possible options are:

"logistic" indicates that the existing model was based on a logistic regression model (default)
"survival" indicates that the existing model was based on a survival regression model

If multiple models are being entered, then all models need to be of the same type - otherwise call function multiple times for each type of model.

model_info

a data.frame that contains the coefficients of the existing prediction model(s). Each column should be a predictor variable (with the name of the column being the name of the predictor variable), with the values being the coefficients, taken exactly as published from the existing prediction model(s). Multiple existing prediction models should be specified by entering multiple rows. If a predictor variable is not present in a given model then enter that cell of the data.frame as NA. See examples.

cum_hazard

A data.frame with two columns: (1) time, and (2) estimated cumulative baseline hazard at that time. The first column (time) should be named 'time' and the second (cumulative baseline hazard) should be named 'hazard'. Only relevant if model_type is "survival"; leave as NULL otherwise. If multiple existing models entered, and model_type = survival, then cum_hazard should be supplied as list of length equal to number of models.

Value

pred_input_info returns an object of class "predinfo", with child classes per model_type. This is a standardised format, such that it can be used with other functions in the package. An object of class "predinfo" is a list containing the following components:

M = the number of existing models that information has been entered about
model_type = this is the type of model that the existing prediction model is based upon ("logistic" or "survival")
coefs = this is the set of (previously estimated) coefficients for each predictor variable
coef_names = gives the names of each predictor variable
formula = this is the functional form of the model's linear predictor
cum_hazard = if supplied, this is the cumulative baseline hazard of the existing model(s)

Details

This function will structure the relevant information about one or more existing prediction model(s) into a standardised format, such that it can be used within other functions in the package.

First, the existing prediction model(s) will have a functional form (i.e. the linear predictor of the model); this will be taken as being a linear combination of the variables specified by the columns of model_info.

Second, each of the predictor variables of the existing prediction model(s) will have a published coefficient (e.g. log-odds-ratio or log-hazard-ratio), which should each be given as the values of model_info. If entering information about multiple existing prediction models, then model_info will contain multiple rows (one per existing model). Here, if a given model does not contain a predictor variable that is included in another model, then set as NA; see examples of this below.

In the case of model_type = "logistic", then model_info must contain a column named as "Intercept", which gives the intercept coefficient of each of the existing logistic regression models (taken exactly as previously published); this should be the first column of model_info.

If model_type = "survival", then the baseline cumulative hazard of the model(s) can be specified in cum_hazard. If the baseline cumulative hazard of the existing survival model is not available, then leave as NULL; this will limit any validation metrics that can be calculated.

Note, the column names of model_info should match columns in any new data that the existing model(s) will be applied to (i.e. any new data that will be provided to other functions within the package should have corresponding predictor variables entered through model_info). See pred_predict, pred_validate, pred_update and pred_stacked_regression for more information.

Examples

#Example 1 - logistic regression existing model
# create a data.frame of the model coefficients, with columns being variables
coefs_table <- data.frame("Intercept" = -3.4,
                          "SexM" = 0.306,
                          "Smoking_Status" = 0.628,
                          "Diabetes" = 0.499,
                          "CKD" = 0.538)
#pass this into pred_input_info()
Existing_Logistic_Model <- pred_input_info(model_type = "logistic",
                                           model_info = coefs_table)
summary(Existing_Logistic_Model)
#> Information about 1 existing model(s) of type 'logistic' 
#> 
#> Model Coefficients 
#> ================================= 
#>   Intercept  SexM Smoking_Status Diabetes   CKD
#> 1      -3.4 0.306          0.628    0.499 0.538
#> 
#> Model Functional Form 
#> ================================= 
#> SexM + Smoking_Status + Diabetes + CKD

#Example 2 - survival model example; uses an example dataset within the
#             package.
pred_input_info(model_type = "survival",
                model_info = SYNPM$Existing_TTE_models[2,],
                cum_hazard = SYNPM$TTE_mod2_baseline)
#> 
#>  Formula: 
#> ~Age + SexM + Smoking_Status + Diabetes + Creatinine
#> <environment: 0x56186543b288>
#> 
#>  Coefficients: 
#>          Age      SexM Smoking_Status  Diabetes Creatinine
#> 2 0.02089659 0.2038455      0.5118238 0.1457449  0.3938243

#Example 3 - Input information about multiple models
summary(pred_input_info(model_type = "logistic",
                        model_info = SYNPM$Existing_logistic_models))
#> Information about 3 existing model(s) of type 'logistic' 
#> 
#> Model Coefficients 
#> ================================= 
#> [[1]]
#>   Intercept        Age      SexM Smoking_Status  Diabetes Creatinine
#> 1 -3.995452 0.01150886 0.2673809      0.7511888 0.5230999  0.5781238
#> 
#> [[2]]
#>   Intercept      SexM Smoking_Status  Diabetes Creatinine
#> 2 -2.281796 0.2227019      0.5281743 0.2002136  0.4337511
#> 
#> [[3]]
#>   Intercept Smoking_Status   Diabetes Creatinine
#> 3 -3.012865      0.5645417 -0.1223695  0.7314432
#> 
#> 
#> Model Functional Form 
#> ================================= 
#> Model 1: Age + SexM + Smoking_Status + Diabetes + Creatinine
#> Model 2: SexM + Smoking_Status + Diabetes + Creatinine
#> Model 3: Smoking_Status + Diabetes + Creatinine