Paper: Predictive performance precision analysis in medicine: identification of low-confidence predictions at patient and profile levels (MED3pa I)

Journal of the American Medical Informatics Association (JAMIA)

Date

2026-03-31

Authors

1 Computer science department, Université de Sherbrooke, Sherbrooke (QC), Canada

2 Mathematics department, Université de Sherbrooke, Sherbrooke (QC), Canada

3 Medicine department, Université de Sherbrooke, Sherbrooke (QC), Canada

4 Department of Higher Studies (CS), École nationale Supérieure d’Informatique, Alger, 16309, Algeria

5 Department of Pediatric Surgery, Montreal Children’s Hospital, Montreal, QC H4A 3J1, Canada

6 Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3J1, Canada

7 Medical Physics Unit, Department of Oncology, McGill University, Montreal, QC H3A 0G4, Canada

Abstract

Objectives Artificial Intelligence models are increasingly used in health care, yet global performance metrics can mask variations in reliability across individual patients or subgroups with shared attributes, called patient profiles. This study introduces predictive performance precision analysis in medicine (MED3pa), a method that identifies when models are less reliable, allowing clinicians to better assess model limitations.

Materials and Methods We propose a framework that estimates predictive confidence using 3 combined approaches: individualized (IPC), aggregated (APC), and mixed predictive confidence (MPC). Individualized predictive confidence estimates confidence for each patient, APC assesses it across profiles, and MPC combines both. We evaluate our method on 4 datasets: 1 simulated, 2 public, and 1 private clinical dataset. Metrics by declaration rate curves show how performance changes when retaining only the most confident predictions, while interpretable decision trees reveal profiles with higher or lower model confidence.

Results We demonstrate our method in internal, temporal, and external validation settings, as well as through a clinical example. In internal validation, limiting predictions to the 93% most confident cases improved sensitivity by 14.3% and the area under the receiver operating characteristic curve by 5.1%. In the clinical example, MED3pa identified a patient profile with high misclassification risk, demonstrating its potential for safer deployment.

Discussion By identifying low-confidence predictions, our framework improves model reliability in clinical settings. It can be integrated into decision support systems to help clinicians make more informed decisions. Confidence thresholds help balance model performance with the proportion of patients for whom predictions are considered reliable.

Conclusion Better leveraging confidence in model predictions could improve reliability and trustworthiness, supporting safer and more effective use in health care.

Previous
Next