Many clinical prediction models appear in academic literature on a monthly basis across various clinical contexts, but they often fall-short of being able to answer a key question that is of clinical interest: what is an individual’s risk of developing multiple outcomes simultaneously? In a recent paper, my co-authors and I investigate methods to unlock this hidden potential of risk models.
There are many instances across healthcare where understanding risk is important to aid clinical decision-making. For example: what is an individual’s chances of surgery being successful? What is an individual’s predicted length of stay on an intensive care unit? What is an individual’s chance of developing cardiovascular disease in the next ten years?
The answers to these types of questions can be obtained from clinical prediction models. These are mathematical models/ algorithms that take a set of characteristics/information about an individual and their health (e.g. their age, their sex, their weight etc.) as inputs, and output the risk of a certain event (e.g. development of cardiovascular disease).
There are lots of examples of these types of models, and many are used to support clinical decision-making. For example, estimation of someone’s ten-year cardiovascular risk is used to guide decision-making around prescription of statins in the UK.
However, prediction models are often developed in isolation to each other, where each model only aims to predict a single outcome. For example, one model might aim to predict cardiovascular disease, while another separate model aims to predict the risk of developing chronic kidney disease.
This situation is suitable if someone is only ever interested in predicting the risk of one event in isolation. In healthcare, however, people are often more interested in the risk of multiple events happening: i.e. “what is my risk of developing chronic kidney disease and cardiovascular disease?”.
Separate models for individual events cannot answer this question.
The reason that most current risk models cannot answer the question of “what’s my risk of developing multiple outcomes simultaneously?” comes down to independence.
Two events are independent if the occurrence or non-occurrence of one event has no effect on the chances that the other event happens. Formally, if we have two possible events, A and B, then these are independent if the probability of them both happening (written as Prob(A and B)) is equal to the probability of A happening (written as Prob(A)) multiplied by the probability of B happening (written as Prob(B)).
That is: Prob(A and B) = Prob(A) x Prob(B)
The important point is that this is only true if the two events (A and B) are independent.
How does this link with risk models and our recent paper? Well, returning to the hypothetical question of “what is my risk of developing chronic kidney disease and cardiovascular disease?” here, the two events are developing chronic kidney disease and developing cardiovascular disease, respectively. If we have two separate prediction models for each of these outcomes, then the first model estimates Prob(chronic kidney disease) and the second model estimates Prob(cardiovascular disease).
If we are interested in the probability of both outcomes occurring for an individual, then a potentially simplistic approach would be to calculate Prob(chronic kidney disease and cardiovascular disease) = Prob(chronic kidney disease) x Prob(cardiovascular disease).
However, the above tells us that this only works if the two events are independent. But here is the catch: chronic kidney disease and cardiovascular disease are unlikely to be independent – once someone develops one condition then this might put them at increased risks of developing the other. This means that we cannot use the above expression to calculate the risk of both happening.
Instead, to accurately estimate the risk of multiple events happening, we need to develop the underlying risk models in such a way to capture (i.e. model) the potential relationships (correlations) between the outcomes.
In our recent paper, my co-authors and I outline methods of doing exactly that. We show that if someone is interested in using models to answer question of “what is the risk of multiple outcomes” (formally, joint risk of multiple outcomes), then they must be developed using specific methods. These methods are called multivariate (in the sense that they model multiple outcomes). We show that developing different models for separate outcomes individually does not allow joint risk estimation in most cases – in fact, such an approach under-estimates joint risk; that is, the two models under-estimate the expected number of people with the multiple outcomes. This is a highly undesirable situation if the models are truly to support healthcare. In the paper, we call for multivariate risk models to be developed more widely in practice, whenever one is interested in estimating the joint risk of multiple events.
From a statistical perspective, this is far from a new concept. However, multivariate techniques are rarely used in prediction modelling, which is potentially a missed opportunity (Biesheuvel et al. 2008).
There are many examples across healthcare where estimating the joint risk of multiple outcomes is important and of interest from a clinical viewpoint.
For example, clinical teams consider mortality, morbidity, and quality of life in their decision-making for performing surgery. Estimating the risk of successful outcomes after surgery using prediction models should, therefore, also consider these multiple outcomes simultaneously.
As another example, with an ageing population the number of people living with multiple long-term conditions (called multi-morbidity) is increasing. Risk models could have a role to play in helping to plan for, and manage, multi-morbidity. For example, patients are likely interested in their whole health rather than specific conditions. Equally, at a healthcare-level, one might be interested in resource planning (e.g. how many people does the healthcare system expect to see with multiple conditions and how can we plan to manage this demand?). Developing multiple separate prediction models for different conditions cannot fully support these goals, as shown in our paper.
The potential of risk models to assist in these health conversations and clinical decisions is only possible if they are based on multivariate approaches.
Read more about our findings in the (open-access) paper: here
Martin, G.P., Sperrin, M., Snell, K.I.E., Buchan, I., Riley, R.D. Clinical Prediction Models to Predict the Risk of Multiple Binary Outcomes: a comparison of approaches. Statistics in Medicine (2020) https://doi.org/10.1002/sim.8787
Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Moons KGM. Polytomous logistic regression analysis could be applied more often in diagnostic research. J Clin Epidemiol. 2008;61(2):125-134