Map new data to a predinfo object — map

This function takes a predinfo object and applies (maps) a new data to this object to check there is consistency between the two. This function is not usually called directly, but rather within other functions within the package, such as pred_predict.

Usage

map_newdata(
  x,
  new_data,
  binary_outcome = NULL,
  survival_time = NULL,
  event_indicator = NULL
)

Arguments

x: an object of class "predinfo".
new_data: data.frame upon which the prediction model should be applied (for subsequent validation/model updating/model aggregation).
binary_outcome: Character variable giving the name of the column in new_data that represents the observed binary outcomes (should be coded 0 and 1 for non-event and event, respectively). Only relevant for model_type="logistic"; leave as NULL otherwise. Leave as NULL if new_data does not contain any outcomes.
survival_time: Character variable giving the name of the column in new_data that represents the observed survival times. Only relevant for model_type="survival"; leave as NULL otherwise. Leave as NULL if new_data does not contain any survival outcomes.
event_indicator: Character variable giving the name of the column in new_data that represents the observed survival indicator (1 for event, 0 for censoring). Only relevant for model_type="survival"; leave as NULL otherwise. Leave as NULL if new_data does not contain any survival outcomes.

Value

Returns a list of the predinfo object, the new_data, and outcomes.

Details

This function maps a new dataset onto a pred_info object. The new dataset might be a validation dataset (to test the performance of the existing prediction model) and/or it might be the dataset on which one wishes to apply model updating methods to revise the model. In any case, this should be specified in new_data as a data.frame. Each row should be an observation (e.g. patient) and each variable/column should be a predictor variable. The predictor variables need to include (as a minimum) all of the predictor variables that are included in the existing prediction model (i.e., each of the variable names supplied to pred_input_info, through the model_info parameter, must match the name of a variables in new_data).

Any factor variables within new_data must be converted to dummy (0/1) variables before calling this function. dummy_vars can help with this.

binary_outcome, survival_time and event_indicator are used to specify the outcome variable(s) within new_data, if relevant (use binary_outcome if model_type = "logistic", or use survival_time and event_indicator if model_type = "survival"). For example, if validating an existing model, then these inputs specify the columns of new_data that will be used for assessing predictive performance of the predictions in the validation dataset. If new_data does not contain outcomes, then leave these inputs to the default of NULL.

Examples

#as above, this function is not usually called directly, but an example of
#such use is:
model1 <- pred_input_info(model_type = "logistic",
                          model_info = SYNPM$Existing_logistic_models[1,])
map_newdata(x = model1,
            new_data = SYNPM$ValidationData[1:10,],
            binary_outcome = "Y")
#> $modelinfo
#> 
#>  Formula: 
#> ~Age + SexM + Smoking_Status + Diabetes + Creatinine
#> <environment: 0x56533b043d00>
#> 
#>  Coefficients: 
#>   Intercept        Age      SexM Smoking_Status  Diabetes Creatinine
#> 1 -3.995452 0.01150886 0.2673809      0.7511888 0.5230999  0.5781238
#> 
#> $PredictionData
#>         Age SexM Smoking_Status Diabetes Creatinine      ETime Status Y
#> 1  48.68421    1              0        0  0.4847849 5.00000000      0 0
#> 2  51.62041    1              0        0  1.0440606 0.02824273      1 1
#> 3  51.37057    0              0        0  0.6682031 5.00000000      0 0
#> 4  50.97524    1              0        1  1.2498120 2.77747285      1 0
#> 5  52.47209    1              0        0  1.4186517 5.00000000      0 0
#> 6  43.69205    1              0        0  0.6271332 5.00000000      0 0
#> 7  49.92449    1              0        0  0.5669761 2.99792812      1 0
#> 8  42.56161    0              0        0  1.7943725 3.19669111      1 0
#> 9  57.96719    0              0        0  1.2668577 2.78071011      1 0
#> 10 51.18527    0              0        0  0.2766686 0.29270868      1 1
#> 
#> $Outcomes
#>  [1] 0 1 0 0 0 0 0 0 0 1
#>