Input coefficient information about one or multiple existing prediction model(s), for use in other functions in the package.
Usage
pred_input_info(
model_type = c("logistic", "survival"),
model_info,
cum_hazard = NULL
)
Arguments
- model_type
specifies the type of model that the existing prediction model is based on; possible options are:
"logistic"
indicates that the existing model was based on a logistic regression model (default)"survival"
indicates that the existing model was based on a survival regression model
If multiple models are being entered, then all models need to be of the same type - otherwise call function multiple times for each type of model.
- model_info
a data.frame that contains the coefficients of the existing prediction model(s). Each column should be a predictor variable (with the name of the column being the name of the predictor variable), with the values being the coefficients, taken exactly as published from the existing prediction model(s). Multiple existing prediction models should be specified by entering multiple rows. If a predictor variable is not present in a given model then enter that cell of the data.frame as NA. See examples.
- cum_hazard
A data.frame with two columns: (1) time, and (2) estimated cumulative baseline hazard at that time. The first column (time) should be named 'time' and the second (cumulative baseline hazard) should be named 'hazard'. Only relevant if
model_type
is "survival"; leave as NULL otherwise. If multiple existing models entered, and model_type = survival, thencum_hazard
should be supplied as list of length equal to number of models.
Value
pred_input_info
returns an object of class
"predinfo
", with child classes per model_type
. This is a
standardised format, such that it can be used with other functions in the
package. An object of class "predinfo
" is a list containing the
following components:
M = the number of existing models that information has been entered about
model_type = this is the type of model that the existing prediction model is based upon ("logistic" or "survival")
coefs = this is the set of (previously estimated) coefficients for each predictor variable
coef_names = gives the names of each predictor variable
formula = this is the functional form of the model's linear predictor
cum_hazard = if supplied, this is the cumulative baseline hazard of the existing model(s)
Details
This function will structure the relevant information about one or more existing prediction model(s) into a standardised format, such that it can be used within other functions in the package.
First, the existing prediction model(s) will have a functional form (i.e.
the linear predictor of the model); this will be taken as being a linear
combination of the variables specified by the columns of model_info
.
Second, each of the predictor variables of the existing prediction model(s)
will have a published coefficient (e.g. log-odds-ratio or
log-hazard-ratio), which should each be given as the values of
model_info
. If entering information about multiple existing
prediction models, then model_info
will contain multiple rows (one
per existing model). Here, if a given model does not contain a predictor
variable that is included in another model, then set as NA; see examples of
this below.
In the case of model_type
= "logistic", then model_info
must
contain a column named as "Intercept", which gives the intercept
coefficient of each of the existing logistic regression models (taken
exactly as previously published); this should be the first column of
model_info
.
If model_type
= "survival", then the baseline cumulative hazard of
the model(s) can be specified in cum_hazard
. If the baseline
cumulative hazard of the existing survival model is not available, then
leave as NULL; this will limit any validation metrics that can be
calculated.
Note, the column names of model_info
should match columns in any new
data that the existing model(s) will be applied to (i.e. any new data that
will be provided to other functions within the package should have
corresponding predictor variables entered through model_info
). See
pred_predict
, pred_validate
,
pred_update
and pred_stacked_regression
for
more information.
Examples
#Example 1 - logistic regression existing model
# create a data.frame of the model coefficients, with columns being variables
coefs_table <- data.frame("Intercept" = -3.4,
"SexM" = 0.306,
"Smoking_Status" = 0.628,
"Diabetes" = 0.499,
"CKD" = 0.538)
#pass this into pred_input_info()
Existing_Logistic_Model <- pred_input_info(model_type = "logistic",
model_info = coefs_table)
summary(Existing_Logistic_Model)
#> Information about 1 existing model(s) of type 'logistic'
#>
#> Model Coefficients
#> =================================
#> Intercept SexM Smoking_Status Diabetes CKD
#> 1 -3.4 0.306 0.628 0.499 0.538
#>
#> Model Functional Form
#> =================================
#> SexM + Smoking_Status + Diabetes + CKD
#Example 2 - survival model example; uses an example dataset within the
# package.
pred_input_info(model_type = "survival",
model_info = SYNPM$Existing_TTE_models[2,],
cum_hazard = SYNPM$TTE_mod2_baseline)
#>
#> Formula:
#> ~Age + SexM + Smoking_Status + Diabetes + Creatinine
#> <environment: 0x5653398b78a8>
#>
#> Coefficients:
#> Age SexM Smoking_Status Diabetes Creatinine
#> 2 0.02089659 0.2038455 0.5118238 0.1457449 0.3938243
#Example 3 - Input information about multiple models
summary(pred_input_info(model_type = "logistic",
model_info = SYNPM$Existing_logistic_models))
#> Information about 3 existing model(s) of type 'logistic'
#>
#> Model Coefficients
#> =================================
#> [[1]]
#> Intercept Age SexM Smoking_Status Diabetes Creatinine
#> 1 -3.995452 0.01150886 0.2673809 0.7511888 0.5230999 0.5781238
#>
#> [[2]]
#> Intercept SexM Smoking_Status Diabetes Creatinine
#> 2 -2.281796 0.2227019 0.5281743 0.2002136 0.4337511
#>
#> [[3]]
#> Intercept Smoking_Status Diabetes Creatinine
#> 3 -3.012865 0.5645417 -0.1223695 0.7314432
#>
#>
#> Model Functional Form
#> =================================
#> Model 1: Age + SexM + Smoking_Status + Diabetes + Creatinine
#> Model 2: SexM + Smoking_Status + Diabetes + Creatinine
#> Model 3: Smoking_Status + Diabetes + Creatinine