Background: Undifferentiated arthritis (UA) often develops into rheumatoid arthritis (RA), but predicting disease progression from seronegative UA remains challenging because seronegative RA often does not meet the classification criteria. This study aims to build a machine learning (ML) model to predict the progression from seronegative UA to RA using clinical and laboratory parameters.
Methods: KURAMA cohort (training dataset) and ANSWER cohort (validation dataset) were utilized. Patients with seronegative UA were selected based on specific inclusion and exclusion criteria. Clinical and laboratory parameters, including demographic data, acute phase reactants, autoantibodies, and physical examination findings, were collected. Various ML models, including a Feedforward Neural Network (FNN), were developed and compared. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and other metrics. SHapley Additive exPlanations (SHAP) values were computed to interpret the importance of variables.
Results: KURAMA cohort included 210 patients with seronegative UA, of whom 57 (27.1%) progressed to RA. The FNN model demonstrated the highest predictive performance with an AUC of 0.924 and a sensitivity of 80.7% in the training dataset. Validation with ANSWER cohort (140 patients; 32.1% progressed to RA) showed an AUC of 0.777, sensitivity of 77.8%. MMP-3 had the highest impact on the model.
Conclusions: The FNN model exhibited robust performance in predicting the progression of RA from seronegative UA and maintained substantial sensitivity in an independent validation cohort. This model using only clinical and laboratory parameters has potential for predicting RA progression in patients with seronegative UA.
© 2025. The Author(s).