Technical Background to predRupdate
Glen P. Martin, PhD; David Jenkins, PhD; Matthew Sperrin, PhD
Source:vignettes/predRupdate_technical.Rmd
predRupdate_technical.Rmd
Preamble
The predRupdate package includes a set of functions
to aid in the validation of a clinical prediction model (CPM) on a given
dataset, and to apply various model updating and aggregation methods.
This vignette aims to overview the technical details of the methods that
are implemented within the predRupdate package. For an
introduction to using the package, please see
vignette("predRupdate")
.
Clinical Prediction Models
Clinical prediction models (CPMs) are statistical models that aim to predict the presence (diagnostic models) or future occurrence (prognostic models) of an outcome of interest for an individual, using information (predictor variables) that are available about that individual at the time the prediction is made. For example, we might use someone’s age, sex, smoking status and family medical history to predict their risk of developing cardiovascular disease in the next ten years. These models can be used to aid clinical decision-making, form the cornerstone of decision support systems, and underpin clinical audit and feedback tasks.
The usual stages for the production of CPMs are: (i) model development and internal validation, (ii) external validation in new data, potentially with updating as needed, and (iii) impact assessment. These are summarised in the table below:
Phase | Description |
---|---|
(i) Model Development and internal validation | The development of the model uses a cohort of patients, where the predictor variables and outcomes are known, to perform variable selection and estimate the prognostic effect (coefficient) of each predictor on the outcome of interest. This is followed by assessment of predictive performance within that data, adjusting for in-sample optimism |
(ii) External validation | Aims to assess the predictive performance of the developed model in data that are different to those used to develop the model (e.g., temporally or geographically different). If performance is deemed unsuitable, the model may undergo updating to tailor it more closely to the new data |
(iii) Impact Assessment | Impact assessment investigates the extent to which the model changes clinical practice and improves patient outcomes |
The aim of predRupdate is to provide tools for stage (ii). Specifically, the package is intended for the situation where a CPM has already been developed (e.g., a model available in the literature), and one wishes to apply the model to a new dataset for validation, updating, or both. The package assumes that one has access to the reported model parameters (all those that are required to make a prediction of risk for a new observation) and an individual participant dataset in which one wishes to apply the model.
This vignette is not intended to be a detailed tutorial of prediction modelling. For that, we refer readers to Riley et al. 2019 and Steyerberg 2009. Instead, this vignette is intended to describe the technical details of how certain methodologies have been implemented in predRupdate.
Predictive Performance Metrics
The predictive performance of a CPM is summarised by its calibration,
discrimination and overall accuracy. We provide a brief overview of the
technical details that are implemented in predRupdate
in the pred_validate()
function, but refer readers to the
literature for a detailed overview (Riley et al. 2019; Steyerberg et
al. 2013; Moons et al. 2012; Altman and Royston 2000; Van Calster et
al. 2016; McLernon et al. 2023).
Calibration
Calibration is the agreement of the predicted risks from the CPM with the observed risks in the validation dataset, across the full risk range. A primary method of assessing calibration is to produce a flexible calibration plot, which graphically depicts the estimated risk (x-axis) and the observed probabilities (y-axis). The observed probabilities are obtained by regressing the observed outcomes in the validation dataset against the linear predictor (calculated for each individual in the validation dataset), using loess or splines.
Specifically, suppose an existing (logistic regression) CPM has been developed (in another dataset) to predict the probability of a binary outcome, . This CPM is given by
where , with being the estimated set of regression coefficients (log odds ratios; taken from the original development of the model) and the value of predictor variable for individual . This linear combination of the regression coefficients and predictor variables is the linear predictor of the model.
To obtain a flexible calibration plot of this model within the
validation data, pred_validate()
fits the following model
to the validation data:
where
is some smooth function; in pred_validate()
this is a
natural cubic spline. This model is then used to obtain observed
probabilities for each individual in the validation dataset, which are
plotted against the corresponding predicted risks obtained from the CPM.
A resulting curve close to the diagonal (y=x line) indicates that
predicted risks correspond well to observed proportions. In
pred_validate()
, a histogram of the predicted risk
distribution is overlaid with the calibration plot, to visually
summarise the probability distribution.
If validating an existing time-to-event CPM,
pred_validate()
produces the flexible calibration plot
using methods similar to those described above for logistic regression
models. Here, the observed probabilities are obtained by fitting a Cox
proportional hazards model to regress the linear predictor of the model
(under natural cubic spline) against the time-to-event outcome in the
validation dataset. A time horizon must also be specified (i.e., the
time during follow-up in which to assess calibration). Such a plot can
only be produced for an existing time-to-event CPM if the cumulative
baseline hazard of the model is supplied.
Alongside flexible calibration plots, pred_validate()
also calculates the calibration slope. The calibration slope (ideal
value 1) indicates the level of over-fitting of the model, with values
less than 1 indicating more over-fitting. In
pred_validate()
this is estimated by fitting a logistic
regression model (if the existing CPM is a logistic model) or Cox
proportion hazards model (if the existing CPM is a survival model) to
the observed outcomes in the validation dataset with the linear
predictor from the CPM as the only covariate.
Finally, calibration-in-the-large summarises how close the mean predicted risk is to the mean outcome proportion in the validation dataset. If validating a logistic regression CPM, this is quantified using the calibration intercept, estimated using the same model as estimating the calibration slope, with the slope fixed at unity. Here, a calibration intercept less than 0 would indicate that the mean predicted risk is higher than the observed outcome proportion. If validating a time-to-event (survival) CPM, then calibration-in-the-large is quantified with the observed:expected ratio (ideal value 1) at a fixed time horizon. Here, the observed proportion is obtained using Kaplan-Meier estimate. Such a metric can only be produced for an existing time-to-event CPM if the cumulative baseline hazard of the model is supplied such that the predicted risks at the time horizon can be calculated.
Discrimination
Discrimination of a CPM is the ability of the model to differentiate
those who experience the outcome from those who do not; i.e., does the
model estimate a higher predicted risk, on average, for those who
experience the outcome, compared to those who do not experience the
outcome. For validating logistic regression CPMs,
pred_validate()
calculates the area under the receiver
operating characteristic curve (AUC) of the CPM. For validating
time-to-event (survival) CPMs, pred_validate()
calculates
Harrell’s C-statistic. An AUC/ C-statistic of 0.5 would indicate
discrimination no better than chance, whereas a value of 1 indicates
perfect calibration.
Overall Accuracy
For validating a logistic regression model,
pred_validate()
also produces two additional metrics of
overall accuracy of the model. The first is the Cox-Snell and Nagelkerke
R-squared values. Here, higher R-squared values indicate better overall
performance of the model.
The Cox-Snell R-squared is estimated by
where is the likelihood ratio statistic, and is the size of the validation dataset. Specifically,
with being the log-likelihood of the CPM on the validation dataset, and is the log-likelihood of an intercept-only model in the validation dataset. Nagelkerke R-squared is then a scaled version of Cox-Snell R-squared, such that 1 is the ‘best’ value (unlike Cox-Snell R-squared).
Finally, the Brier score of the logistic regression CPM is also calculated, as the average squared difference between the observed outcome and the predicted risks from the model.
Model Updating Methods
Upon validating an existing CPM in new data (external validation) it is not uncommon to find that the models predictive performance deteriorates. Rather than developing a new model, an alternative strategy is to apply model updating methods. See Moons et al. 2012 and Su et al. 2018 for a detailed overview of these methods. The extent of model updating depends on the issues identified during model validation.
If the model validation identifies that the CPM has poor
calibration-in-the-large, then a simple updating strategy is “intercept
update”. For a logistic regression CPM, pred_update()
fits
the following model in the new dataset using maximum likelihood
estimation:
Here, serves to update the intercept of the original model such that the mean predicted risk from the (updated) CPM matches the observed event proportion in the new dataset. Specifically, the new intercept will be . All other terms () remain the same as originally published.
Often, the original model will also demonstrate elements of
over-fitting (i.e., the calibration slope of the model is <1 in the
validation data). In such a case, the model can be re-calibrated in the
new dataset. For a logistic regression CPM, pred_update()
fits the following model in the new dataset using maximum likelihood
estimation:
This has the effect of multiplying each original model coefficient by . Specifically, the new (updated) regression coefficients will be , and the new model intercept is .
Neither of the above methods alter the discrimination of the model within the new dataset. For this, one option is to refit the model to the new dataset. Here, one would fit the following model in the new dataset using maximum likelihood estimation:
Here, become the new regression coefficients of the updated model. There is also a less extreme version of model refitting, where one only re-estimates certain parameters. This will be implemented into the package in the near future.
While we describe the updating methods above for logistic regression models, predRupdate also implements these approaches for time-to-event (survival) CPMs. The principles above are similar, expect the underlying model is changed to a Cox proportional hazards model (further parametric/ flexible parametric methods to be added in the future). Specifically, “intercept update” for a survival (time-to-event) CPM, has the following form:
which has the effect of only re-estimating the baseline hazard in the new data. For re-calibration, the updated survival (time-to-event) CPM would be: This has the effect of multiplying each original model coefficient (log-hazard ratios) by , and estimating a new baseline hazard in the new data.
Model Aggregation Methods
In some situations, there are instances where multiple existing CPMs
are available for the same prediction task (e.g., existing models
developed across different countries). Here, model aggregation methods
can be used to pool these existing CPMs into a single model in the new
data. Various methods exist for this (Martin et al. 2017), with
predRupdate currently implementing stacked regression
(Debray et al. 2014) through pred_stacked_regression()
.
More methods will be added in the future.
Specifically, suppose there are a collection of existing logistic regression CPMs, which all aim to predict the same binary outcome, , but where developed in different populations . The models may contain different predictor variables. Each model has a linear predictor (LP) given by
In this notation, if a given variable is not included in a given CPM,
then the corresponding
is equal to zero. One can then apply these models to each observation
in the validation set to calculate the set of
linear predictors. pred_stacked_regression()
then applies
stacked regression by regressing these set of linear predictors against
the observed outcome in the validation dataset as
This equation can be re-arranged to express in terms of the set of
new (aggregated) predictor coefficients for each of the
predictor variables. Given that the
existing models all aim to predict the same outcome, there could be a
high level of co-linearity in the set of
linear predictors. Therefore, there have been suggestions to fit the
above stacked regression model under the constraint that all of
should be non-negative. pred_stacked_regression()
implements stacked regression with and without this constraint.
pred_stacked_regression()
can also implement stacked
regression for
existing time-to-event (survival) CPMs. The technical details are
similar to those described above for logistic models, with the exception
that the underlying model is changed to a Cox proportional hazards
model.
References
- Riley RD, et al. Prognosis Research in Healthcare: Oxford University Press; 2019
- Steyerberg EW. Clinical Prediction Models: Springer New York; 2009
- Riley RD, Hayden JA, Steyerberg EW, et al. Prognosis research strategy (PROGRESS) 2: Prognostic factor research. PLoS Med 2013;10:e1001380–e1001380
- Steyerberg EW, Moons KG, van der Windt DA, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLoS Med 2013;10:e1001381–e1001381
- Moons KGM, Kengne AP, Grobbee DE, et al Risk prediction models: II. External validation, model updating, and impact assessment Heart 2012;98:691-698.
- Altman, D.G. and Royston, P. What do we mean by validating a prognostic model?. Statist. Med., 2000;19:453-473.
- Van Calster, B., Nieboer, D., Vergouwe, Y., et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. J. Clin. Epidemiol. 2016;74:167-176
- McLernon, D.J., Giardiello, D., Van Calster, B. et al. Assessing Performance and Clinical Usefulness in Prediction Models With Survival Outcomes: Practical Guidance for Cox Proportional Hazards Models. Ann Intern Med. 2023;176:105-114
- Su T-L, Jaki T, Hickey GL, Buchan I, Sperrin M. A review of statistical updating methods for clinical prediction models. Statistical Methods in Medical Research. 2018;27(1):185-197
- Martin, G.P., Mamas, M.A., Peek, N. et al. Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models. BMC Med Res Methodol. 2017;17(1)
- Debray TP, Koffijberg H, Nieboer D, Vergouwe Y, Steyerberg EW, Moons KG. Meta-analysis and aggregation of multiple published prediction models. Stat Med. 2014;33(14):2341-62