Background: Acute graft-versus-host disease (aGVHD) is a major post-transplantation complication and one of the most significant causes of non-relapse-related death. However, the massive and complex clinical data make aGVHD difficult to predict. Machine learning (ML), a branch of artificial intelligence, has since been introduced in medicine due to its ability to process complex, high-dimensional variables quickly and capture nonlinear relationships. However, the effects of immunosuppressants exposure was not considered in previous ML models. Thus, the purpose of this study was to develop and optimize models by Cox regression and machine learning algorithms to predict the risk of aGVHD in which cyclosporin A exposure and common clinical factors were included as variables.
Methods: The data was preprocessed in the first step, and was randomly allocated at an 8:2 ratio. Cox regression model was constructed on the training set. Meanwhile, correlation analysis and recursive feature elimination were used for feature screening before machine learning model development. Then fifteen algorithms were used to establish models, and an ensemble model was established through soft voting based on the top five performance algorithms. Area under curve (AUC) was the main metric used to evaluate the model performance in the validation set, while nomogram and SHAP were applied to interpret the variables.
Result: A total of 479 patients and 47 variables were included in the study. The incidence of grade II-IV aGVHD was 33.61%. The AUC of Cox regression model in the validation set was 0.625. In contrast, the new ensemble model has a better prediction ability (AUC = 0.776, Accuracy = 0.729, Precision = 0.667, Recall = 0.375, F1-score = 0.480). Except for the variables which were identified by previous studies, some rarely reported risk factors were found, such as quinolone, blood urea nitrogen and alkaline phosphatase.
Conclusions: In summary, a new ensemble model with promising accuracy was established to predict grade II-IV classic aGVHD in allo-HSCT patients. It will help identify high-risk patients at an early stage and thus reduce the incidence of aGVHD.
Clinical trial number: Not applicable.
Keywords: Acute graft-versus-host disease; Allogeneic haematopoietic stem cell transplantation; Ensemble model; Machine learning; Prediction model.
© 2025. The Author(s).