Benchmarking large language models for replication of guideline-based PGx recommendations
{{output}}
We evaluated the ability of large language models (LLMs) to generate clinically accurate pharmacogenomic (PGx) recommendations aligned with CPIC guidelines. Using a benchmark of 599 curated gene-drug-phenotype scenarios, we compared five leading models, includ... ...