Background/objectives: Large-language modules facilitate accessing health information instantaneously. However, they do not provide the same level of accuracy or detail. In pediatric orthopedics, where parents have urgent concerns regarding knee deformities (bowlegs and knock knees), the accuracy and dependability of these chatbots can affect parent decisions to seek treatment. The goal of this study was to analyze how AI chatbots addressed parental concerns regarding pediatric knee deformities.
Methods: A set of twenty standardized questions, consisting of ten questions each on bowlegs and knock knees, were designed through literature reviews and through analysis of parental discussion forums and expert consultations. Each of the three chatbots (ChatGPT, Gemini, and Copilot) was asked the same set of questions. Five pediatric orthopedic surgeons were then asked to rate each response for accuracy, clarity, and comprehensiveness, along with the degree of misleading information provided, on a scale of 1-5. The reliability among raters was calculated using intraclass correlation coefficients (ICCs), while differences among the chatbots were assessed using a Kruskal-Wallis test with post hoc pairwise comparisons.
Results: All three chatbots displayed a moderate-to-good score for inter-rater reliability. ChatGPT and Gemini's scores were higher for accuracy and comprehensiveness than Copilot's (p < 0.05). However, no notable differences were found in clarity or in the likelihood of giving incorrect answers. Overall, more detailed and precise responses were given by ChatGPT and Gemini, while, with regard to clarity, Copilot performed comparably but was less thorough.
Conclusions: There were notable discrepancies in performance across the AI chatbots in providing pediatric orthopedic information, which demonstrates indications of evolving potential. In comparison to Copilot, ChatGPT and Gemini were relatively more accurate and comprehensive. These results highlight the persistent requirement for real-time supervision and stringent validation when employing chatbots in the context of pediatric healthcare.
Keywords: AI chatbots; health information accuracy; knee deformities; parental concerns.