Background: Despite the promising prospects of utilizing artificial intelligence and machine learning (ML) for comprehensive disease analysis, few models constructed have been applied in clinical practice due to their complexity and the lack of reasonable explanations. In contrast to previous studies with small sample sizes and limited model interpretability, we developed a transparent eXtreme Gradient Boosting (XGBoost)-based model supported by multi-center data, using patients' basic information and clinical indicators to forecast the occurrence of anastomotic leakage (AL) after rectal cancer resection surgery. The model demonstrated robust predictive performance and identified clinically relevant thresholds, which may assist physicians in optimizing perioperative management.
Aim: To develop an interpretable ML model for accurately predicting the occurrence probability of AL after rectal cancer resection and define our clinical alert values for serum calcium ions.
Methods: Patients who underwent anterior resection of the rectum for rectal carcinoma at the Department of Digestive Surgery, Xijing Hospital of Digestive Diseases, Air Force Medical University, and Shaanxi Provincial People's Hospital, were retrospectively collected from January 2011 to December 2021,. Ten ML models were integrated to analyze the data and develop the predictive models. Receiver operating characteristic (ROC) curves, calibration curve, decision curve analysis, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were used to evaluate model performance. We employed the SHapley Additive exPlanations (SHAP) algorithm to explain the feature importance of the optimal model.
Results: A total of ten features were integrated to construct the predictive model and identify the optimal model. XGBoost was considered the best-performing model with an area under the ROC curve (AUC) of 0.984 (95%confidence interval: 0.972-0.996) in the test set (accuracy: 0.925; sensitivity: 0.92; specificity: 0.927). Furthermore, the model achieved an AUC of 0.703 in external validation. The interpretable SHAP algorithm revealed that the serum calcium ion level was the crucial factor influencing the predictions of the model.
Conclusion: A superior predictive model, leveraging clinical data, has been crafted by employing the most effective XGBoost from a selection of ten algorithms. This model, by predicting the occurrence of AL in patients after rectal cancer resection, has identified the significant role of serum calcium ion levels, providing guidance for clinical practice. The integration of SHAP provides a clear interpretation of the model's predictions.
Keywords: Anastomotic leakage; Machine learning; Rectal cancer; SHapley Additive exPlanations algorithms.
©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.