This function takes a predinfo
object and applies (maps) a new data to
this object to check there is consistency between the two. This function is
not usually called directly, but rather within other functions within the
package, such as pred_predict
.
Usage
map_newdata(
x,
new_data,
binary_outcome = NULL,
survival_time = NULL,
event_indicator = NULL
)
Arguments
- x
an object of class "predinfo".
- new_data
data.frame upon which the prediction model should be applied (for subsequent validation/model updating/model aggregation).
- binary_outcome
Character variable giving the name of the column in
new_data
that represents the observed binary outcomes (should be coded 0 and 1 for non-event and event, respectively). Only relevant formodel_type
="logistic"; leave asNULL
otherwise. Leave asNULL
ifnew_data
does not contain any outcomes.- survival_time
Character variable giving the name of the column in
new_data
that represents the observed survival times. Only relevant formodel_type
="survival"; leave asNULL
otherwise. Leave asNULL
ifnew_data
does not contain any survival outcomes.- event_indicator
Character variable giving the name of the column in
new_data
that represents the observed survival indicator (1 for event, 0 for censoring). Only relevant formodel_type
="survival"; leave asNULL
otherwise. Leave asNULL
ifnew_data
does not contain any survival outcomes.
Details
This function maps a new dataset onto a pred_info
object. The
new dataset might be a validation dataset (to test the performance of the
existing prediction model) and/or it might be the dataset on which one
wishes to apply model updating methods to revise the model. In any case,
this should be specified in new_data
as a data.frame. Each row
should be an observation (e.g. patient) and each variable/column should be
a predictor variable. The predictor variables need to include (as a
minimum) all of the predictor variables that are included in the existing
prediction model (i.e., each of the variable names supplied to
pred_input_info
, through the model_info
parameter,
must match the name of a variables in new_data
).
Any factor variables within new_data
must be converted to dummy
(0/1) variables before calling this function. dummy_vars
can
help with this.
binary_outcome
, survival_time
and event_indicator
are
used to specify the outcome variable(s) within new_data
, if relevant
(use binary_outcome
if model_type
= "logistic", or use
survival_time
and event_indicator
if model_type
=
"survival"). For example, if validating an existing model, then these
inputs specify the columns of new_data
that will be used for
assessing predictive performance of the predictions in the validation
dataset. If new_data
does not contain outcomes, then leave these
inputs to the default of NULL
.
Examples
#as above, this function is not usually called directly, but an example of
#such use is:
model1 <- pred_input_info(model_type = "logistic",
model_info = SYNPM$Existing_logistic_models[1,])
map_newdata(x = model1,
new_data = SYNPM$ValidationData[1:10,],
binary_outcome = "Y")
#> $modelinfo
#>
#> Formula:
#> ~Age + SexM + Smoking_Status + Diabetes + Creatinine
#> <environment: 0x56533b043d00>
#>
#> Coefficients:
#> Intercept Age SexM Smoking_Status Diabetes Creatinine
#> 1 -3.995452 0.01150886 0.2673809 0.7511888 0.5230999 0.5781238
#>
#> $PredictionData
#> Age SexM Smoking_Status Diabetes Creatinine ETime Status Y
#> 1 48.68421 1 0 0 0.4847849 5.00000000 0 0
#> 2 51.62041 1 0 0 1.0440606 0.02824273 1 1
#> 3 51.37057 0 0 0 0.6682031 5.00000000 0 0
#> 4 50.97524 1 0 1 1.2498120 2.77747285 1 0
#> 5 52.47209 1 0 0 1.4186517 5.00000000 0 0
#> 6 43.69205 1 0 0 0.6271332 5.00000000 0 0
#> 7 49.92449 1 0 0 0.5669761 2.99792812 1 0
#> 8 42.56161 0 0 0 1.7943725 3.19669111 1 0
#> 9 57.96719 0 0 0 1.2668577 2.78071011 1 0
#> 10 51.18527 0 0 0 0.2766686 0.29270868 1 1
#>
#> $Outcomes
#> [1] 0 1 0 0 0 0 0 0 0 1
#>