Use an existing prediction model to estimate predicted risks of the outcome for each observation in a new dataset.
Usage
pred_predict(
x,
new_data,
binary_outcome = NULL,
survival_time = NULL,
event_indicator = NULL,
time_horizon = NULL
)
Arguments
- x
an object of class "
predinfo
" produced by callingpred_input_info
.- new_data
data.frame upon which predictions are obtained using the prediction model.
- binary_outcome
Character variable giving the name of the column in
new_data
that represents the observed binary outcomes (should be coded 0 and 1 for non-event and event, respectively). Only relevant formodel_type
="logistic"; leave asNULL
otherwise. Leave asNULL
ifnew_data
does not contain any outcomes.- survival_time
Character variable giving the name of the column in
new_data
that represents the observed survival times. Only relevant formodel_type
="survival"; leave asNULL
otherwise. Leave asNULL
ifnew_data
does not contain any survival outcomes.- event_indicator
Character variable giving the name of the column in
new_data
that represents the observed survival indicator (1 for event, 0 for censoring). Only relevant formodel_type
="survival"; leave asNULL
otherwise. Leave asNULL
ifnew_data
does not contain any survival outcomes.- time_horizon
for survival models, an integer giving the time horizon (post baseline) at which a prediction is required (i.e. the t at which P(T<t) should be estimated). Currently, this must match a time in x$cum_hazard. If left as NULL, no predicted risks will be returned, just the linear predictor.
Value
pred_predict
returns a list containing the following
components:
LinearPredictor = the linear predictor for each observation in the new data (i.e., the linear combination of the models predictor variables and their corresponding coefficients)
PredictedRisk = the predicted risk for each observation in the new data
TimeHorizon = for survival models, an integer giving the time horizon at which a prediction is made
Outcomes = vector of outcomes/endpoints (if available).
Details
This function takes the relevant information about the existing
prediction model (as supplied by calling pred_input_info
),
and returns the linear predictor and predicted risks for each
individual/observation in new_data
.
If the existing prediction model is based on logistic regression (i.e., if x$model_type == "logistic"), the predicted risks will be the predicted probability of the binary outcome conditional on the predictor variables in the new data (i.e., P(Y=1 | X)). If the existing prediction model is based on a time-to-event/survival model (i.e., if x$model_type == "survival"), the predicted risks can only be calculated if a baseline cumulative hazard is provided; in this case, the predicted risks will be one minus the survival probability (i.e., 1 - S(T>time horizon | X)).
new_data
should be a data.frame, where each row should be an
observation (e.g. patient) and each variable/column should be a predictor
variable. The predictor variables need to include (as a minimum) all of the
predictor variables that are included in the existing prediction model
(i.e., each of the variable names supplied to
pred_input_info
, through the model_info
parameter,
must match the name of a variables in new_data
).
Any factor variables within new_data
must be converted to dummy
(0/1) variables before calling this function. dummy_vars
can
help with this. See examples.
binary_outcome
, survival_time
and event_indicator
are
used to specify the outcome variable(s) within new_data
(use
binary_outcome
if x$model_type
= "logistic", or use
survival_time
and event_indicator
if x$model_type
=
"survival").
Examples
#Example 1 - logistic regression existing model - shows handling of factor variables
coefs_table <- data.frame("Intercept" = -3.4,
"Sex_M" = 0.306,
"Smoking_Status" = 0.628)
existing_Logistic_Model <- pred_input_info(model_type = "logistic",
model_info = coefs_table)
new_df <- data.frame("Sex" = as.factor(c("M", "F", "M", "M", "F", "F", "M")),
"Smoking_Status" = c(1, 0, 0, 1, 1, 0, 1))
#new_df has a factor variable, so needs indicator variables creating before pred_predict:
new_df_indicators <- dummy_vars(new_df)
pred_predict(x = existing_Logistic_Model,
new_data = new_df_indicators)
#> $LinearPredictor
#> [1] -2.466 -3.400 -3.094 -2.466 -2.772 -3.400 -2.466
#>
#> $PredictedRisk
#> [1] 0.07827635 0.03229546 0.04335543 0.07827635 0.05885613 0.03229546 0.07827635
#>
#> $Outcomes
#> NULL
#>
#Example 2 - survival model example; uses an example dataset within the
# package. Multiple existing models
model2 <- pred_input_info(model_type = "survival",
model_info = SYNPM$Existing_TTE_models,
cum_hazard = list(SYNPM$TTE_mod1_baseline,
SYNPM$TTE_mod2_baseline,
SYNPM$TTE_mod3_baseline))
pred_predict(x = model2,
new_data = SYNPM$ValidationData[1:10,],
survival_time = "ETime",
event_indicator = "Status",
time_horizon = 5)
#> [[1]]
#> [[1]]$LinearPredictor
#> [1] 0.8512214 1.2003842 0.7529033 1.7412421 1.4264254 0.8998310 0.9082076
#> [8] 1.3527208 1.1508770 0.5215825
#>
#> [[1]]$PredictedRisk
#> [1] 0.2466734 0.3307674 0.2264248 0.4983138 0.3955802 0.2572276 0.2590832
#> [8] 0.3735659 0.3176582 0.1843038
#>
#> [[1]]$TimeHorizon
#> [1] 5
#>
#> [[1]]$Outcomes
#> [1] 5.00000000+ 0.02824273 5.00000000+ 2.77747285 5.00000000+ 5.00000000+
#> [7] 2.99792812 3.19669111 2.78071011 0.29270868
#>
#>
#> [[2]]
#> [[2]]$LinearPredictor
#> [1] 1.412099 1.693712 1.336624 1.907005 1.859033 1.363840 1.470386 1.596060
#> [9] 1.710236 1.178556
#>
#> [[2]]$PredictedRisk
#> [1] 0.5637400 0.6669059 0.5366225 0.7435170 0.7266379 0.5463529 0.5849280
#> [8] 0.6310348 0.6729511 0.4814651
#>
#> [[2]]$TimeHorizon
#> [1] 5
#>
#> [[2]]$Outcomes
#> [1] 5.00000000+ 0.02824273 5.00000000+ 2.77747285 5.00000000+ 5.00000000+
#> [7] 2.99792812 3.19669111 2.78071011 0.29270868
#>
#>
#> [[3]]
#> [[3]]$LinearPredictor
#> [1] 0.8162710 1.1756009 0.8363372 1.2931219 1.4067912 0.8589199 0.8759559
#> [8] 1.4347014 1.2503379 0.6006843
#>
#> [[3]]$PredictedRisk
#> [1] 0.3090821 0.4111573 0.3142406 0.4487901 0.4869303 0.3201236 0.3246159
#> [8] 0.4965302 0.4348664 0.2577218
#>
#> [[3]]$TimeHorizon
#> [1] 5
#>
#> [[3]]$Outcomes
#> [1] 5.00000000+ 0.02824273 5.00000000+ 2.77747285 5.00000000+ 5.00000000+
#> [7] 2.99792812 3.19669111 2.78071011 0.29270868
#>
#>