Skip to contents

Use an existing prediction model to estimate predicted risks of the outcome for each observation in a new dataset.

Usage

pred_predict(
  x,
  new_data,
  binary_outcome = NULL,
  survival_time = NULL,
  event_indicator = NULL,
  time_horizon = NULL
)

Arguments

x

an object of class "predinfo" produced by calling pred_input_info.

new_data

data.frame upon which predictions are obtained using the prediction model.

binary_outcome

Character variable giving the name of the column in new_data that represents the observed binary outcomes (should be coded 0 and 1 for non-event and event, respectively). Only relevant for model_type="logistic"; leave as NULL otherwise. Leave as NULL if new_data does not contain any outcomes.

survival_time

Character variable giving the name of the column in new_data that represents the observed survival times. Only relevant for model_type="survival"; leave as NULL otherwise. Leave as NULL if new_data does not contain any survival outcomes.

event_indicator

Character variable giving the name of the column in new_data that represents the observed survival indicator (1 for event, 0 for censoring). Only relevant for model_type="survival"; leave as NULL otherwise. Leave as NULL if new_data does not contain any survival outcomes.

time_horizon

for survival models, an integer giving the time horizon (post baseline) at which a prediction is required (i.e. the t at which P(T<t) should be estimated). Currently, this must match a time in x$cum_hazard. If left as NULL, no predicted risks will be returned, just the linear predictor.

Value

pred_predict returns a list containing the following components:

  • LinearPredictor = the linear predictor for each observation in the new data (i.e., the linear combination of the models predictor variables and their corresponding coefficients)

  • PredictedRisk = the predicted risk for each observation in the new data

  • TimeHorizon = for survival models, an integer giving the time horizon at which a prediction is made

  • Outcomes = vector of outcomes/endpoints (if available).

Details

This function takes the relevant information about the existing prediction model (as supplied by calling pred_input_info), and returns the linear predictor and predicted risks for each individual/observation in new_data.

If the existing prediction model is based on logistic regression (i.e., if x$model_type == "logistic"), the predicted risks will be the predicted probability of the binary outcome conditional on the predictor variables in the new data (i.e., P(Y=1 | X)). If the existing prediction model is based on a time-to-event/survival model (i.e., if x$model_type == "survival"), the predicted risks can only be calculated if a baseline cumulative hazard is provided; in this case, the predicted risks will be one minus the survival probability (i.e., 1 - S(T>time horizon | X)).

new_data should be a data.frame, where each row should be an observation (e.g. patient) and each variable/column should be a predictor variable. The predictor variables need to include (as a minimum) all of the predictor variables that are included in the existing prediction model (i.e., each of the variable names supplied to pred_input_info, through the model_info parameter, must match the name of a variables in new_data). Any factor variables within new_data must be converted to dummy (0/1) variables before calling this function. dummy_vars can help with this. See examples.

binary_outcome, survival_time and event_indicator are used to specify the outcome variable(s) within new_data (use binary_outcome if x$model_type = "logistic", or use survival_time and event_indicator if x$model_type = "survival").

See also

Examples

#Example 1 - logistic regression existing model - shows handling of factor variables
coefs_table <- data.frame("Intercept" = -3.4,
                          "Sex_M" = 0.306,
                          "Smoking_Status" = 0.628)
existing_Logistic_Model <- pred_input_info(model_type = "logistic",
                                           model_info = coefs_table)
new_df <- data.frame("Sex" = as.factor(c("M", "F", "M", "M", "F", "F", "M")),
                     "Smoking_Status" = c(1, 0, 0, 1, 1, 0, 1))
#new_df has a factor variable, so needs indicator variables creating before pred_predict:
new_df_indicators <- dummy_vars(new_df)
pred_predict(x = existing_Logistic_Model,
             new_data = new_df_indicators)
#> $LinearPredictor
#> [1] -2.466 -3.400 -3.094 -2.466 -2.772 -3.400 -2.466
#> 
#> $PredictedRisk
#> [1] 0.07827635 0.03229546 0.04335543 0.07827635 0.05885613 0.03229546 0.07827635
#> 
#> $Outcomes
#> NULL
#> 

#Example 2 - survival model example; uses an example dataset within the
#             package. Multiple existing models
model2 <- pred_input_info(model_type = "survival",
                          model_info = SYNPM$Existing_TTE_models,
                          cum_hazard = list(SYNPM$TTE_mod1_baseline,
                                                SYNPM$TTE_mod2_baseline,
                                                SYNPM$TTE_mod3_baseline))
pred_predict(x = model2,
             new_data = SYNPM$ValidationData[1:10,],
             survival_time = "ETime",
             event_indicator = "Status",
             time_horizon = 5)
#> [[1]]
#> [[1]]$LinearPredictor
#>  [1] 0.8512214 1.2003842 0.7529033 1.7412421 1.4264254 0.8998310 0.9082076
#>  [8] 1.3527208 1.1508770 0.5215825
#> 
#> [[1]]$PredictedRisk
#>  [1] 0.2466734 0.3307674 0.2264248 0.4983138 0.3955802 0.2572276 0.2590832
#>  [8] 0.3735659 0.3176582 0.1843038
#> 
#> [[1]]$TimeHorizon
#> [1] 5
#> 
#> [[1]]$Outcomes
#>  [1] 5.00000000+ 0.02824273  5.00000000+ 2.77747285  5.00000000+ 5.00000000+
#>  [7] 2.99792812  3.19669111  2.78071011  0.29270868 
#> 
#> 
#> [[2]]
#> [[2]]$LinearPredictor
#>  [1] 1.412099 1.693712 1.336624 1.907005 1.859033 1.363840 1.470386 1.596060
#>  [9] 1.710236 1.178556
#> 
#> [[2]]$PredictedRisk
#>  [1] 0.5637400 0.6669059 0.5366225 0.7435170 0.7266379 0.5463529 0.5849280
#>  [8] 0.6310348 0.6729511 0.4814651
#> 
#> [[2]]$TimeHorizon
#> [1] 5
#> 
#> [[2]]$Outcomes
#>  [1] 5.00000000+ 0.02824273  5.00000000+ 2.77747285  5.00000000+ 5.00000000+
#>  [7] 2.99792812  3.19669111  2.78071011  0.29270868 
#> 
#> 
#> [[3]]
#> [[3]]$LinearPredictor
#>  [1] 0.8162710 1.1756009 0.8363372 1.2931219 1.4067912 0.8589199 0.8759559
#>  [8] 1.4347014 1.2503379 0.6006843
#> 
#> [[3]]$PredictedRisk
#>  [1] 0.3090821 0.4111573 0.3142406 0.4487901 0.4869303 0.3201236 0.3246159
#>  [8] 0.4965302 0.4348664 0.2577218
#> 
#> [[3]]$TimeHorizon
#> [1] 5
#> 
#> [[3]]$Outcomes
#>  [1] 5.00000000+ 0.02824273  5.00000000+ 2.77747285  5.00000000+ 5.00000000+
#>  [7] 2.99792812  3.19669111  2.78071011  0.29270868 
#> 
#>