Skip to contents

This function is included for situations where one has a vector of predicted probabilities from a model and a vector of observed binary outcomes that we wish to validate the predictions against. See pred_validate for the main validation function of this package.

Usage

pred_val_probs(binary_outcome, Prob, cal_plot = TRUE, level = 0.95, ...)

Arguments

binary_outcome

vector of binary outcomes (coded as 1 if outcome happened, and 0 otherwise). Must be of same length as Prob

Prob

vector of predicted probabilities. Must be of same length of binary_outcome.

cal_plot

indicate if a flexible calibration plot should be produced (TRUE) or not (FALSE).

level

the confidence level required for all performance metrics. Defaults at 95%. Must be a value between 0 and 1.

...

further plotting arguments for the calibration plot. See Details below.

Value

An object of class "predvalidate", which is a list containing relevant calibration and discrimination measures. See pred_validate for details.

Details

This function takes a vector of observed binary outcomes, and a corresponding vector of predicted risks (e.g. from a logistic regression CPM), and calculates measures of predictive performance. The function is intended as a standalone way of validating predicted risks against binary outcomes outside of the usual pred_input_info() -> pred_validate() package workflow. See pred_validate for the main validation function of this package.

Various metrics of calibration (agreement between the observed risk and the predicted risks, across the full risk range) and discrimination (ability of the model to distinguish between those who develop the outcome and those who do not) are calculated. For calibration, the observed-to-expected ratio, calibration intercept and calibration slopes are estimated. The calibration intercept is estimated by fitting a logistic regression model to the observed binary outcomes, with the linear predictor of the model as an offset. For calibration slope, a logistic regression model is fit to the observed binary outcome with the linear predictor from the model as the only covariate. For discrimination, the function estimates the area under the receiver operating characteristic curve (AUC). Various other metrics are also calculated to assess overall accuracy (Brier score, Cox-Snell R2).

A flexible calibration plot is produced. Specify parameter cal_plot to indicate whether a calibration plot should be produced (TRUE), or not (FALSE). See pred_validate for details on this plot, and details of optional plotting arguments.

Examples

# simulate some data for purposes of example illustration
set.seed(1234)
x1 <- rnorm(2000)
LP <- -2 + (0.5*x1)
PR <- 1/(1+exp(-LP))
y <- rbinom(2000, 1, PR)

#fit hypothetical model to the simulated data
mod <- glm(y[1:1000] ~ x1[1:1000], family = binomial(link="logit"))

#obtain the predicted risks from the model
pred_risk <- predict(mod, type = "response",
                      newdata = data.frame("x1" = x1[1001:2000]))

#Use pred_val_probs to validate the predicted risks against the
#observed outcomes
summary(pred_val_probs(binary_outcome = y[1001:2000],
                        Prob = pred_risk,
                        cal_plot = FALSE))
#> Calibration Measures 
#> --------------------------------- 
#>                         Estimate Lower 95% Confidence Interval
#> Observed:Expected Ratio   0.8018                        0.6740
#> Calibration Intercept    -0.2585                       -0.4563
#> Calibration Slope         1.2460                        0.7817
#>                         Upper 95% Confidence Interval
#> Observed:Expected Ratio                        0.9539
#> Calibration Intercept                         -0.0608
#> Calibration Slope                              1.7102
#> 
#>  Also examine the calibration plot, if produced. 
#> 
#> Discrimination Measures 
#> --------------------------------- 
#>     Estimate Lower 95% Confidence Interval Upper 95% Confidence Interval
#> AUC   0.6523                        0.5998                        0.7048
#> 
#> 
#> Overall Performance Measures 
#> --------------------------------- 
#> Cox-Snell R-squared: 0.0211
#> Nagelkerke R-squared: 0.0416
#> Brier Score (CI): 0.098 (0.0832, 0.1128)
#> 
#>  Also examine the distribution plot of predicted risks.