Emulating real-world GLP-1 efficacy in type 2 diabetes through causal learning and virtual patients
PLOS Digital Health

Summary
This groundbreaking computational study developed and validated an artificial intelligence-powered virtual clinical trial emulation framework to assess diabetes treatment effectiveness. Researchers trained a generative AI model using real-world data from 5,476 people with type 2 diabetes to emulate the results of the established LEAD-5 randomized controlled trial, which compared GLP-1 receptor agonists (liraglutide) against basal insulin (glargine) and placebo. The AI model successfully replicated the LEAD-5 trial's key findings, correctly identifying that GLP-1 was superior to basal insulin and placebo for reducing HbA1c, body mass index, and systolic blood pressure. Virtual trials showed GLP-1 reduced HbA1c by 1.21 mmol/mol more than basal insulin and 2.58 mmol/mol more than placebo (both p<0.001), mirroring the treatment ranking established in the original trial. The model also captured GLP-1's advantages for weight management and blood pressure control. Importantly, when researchers used the AI model to explore counterfactual scenarios—extending treatment predictions to broader patient populations outside the strict LEAD-5 inclusion criteria—they discovered that GLP-1 was not always the most effective choice for all patient phenotypes. This demonstrates the potential of causal AI-powered trial emulation to personalize treatment recommendations based on individual patient characteristics, extending beyond the limitations of traditional randomized controlled trials. The framework offers a promising tool for post-market surveillance, health technology assessment, and exploring treatment effectiveness in diverse real-world populations typically excluded from clinical trials.
Study Design
Interventions
Study Type
Outcomes
Duration and Size
Study Population
Age Range
Sex
Geography
Other Criteria
Methodology
This computational validation study employed a novel AI-powered virtual clinical trial emulation framework combining generative adversarial networks (GANs) with causal structure learning. The model was trained on the SCI-Diabetes dataset from the Glasgow Safe Haven platform, comprising 5,476 people with type 2 diabetes. The generative model architecture integrated Wasserstein GAN with gradient penalty (WGAN-GP) and NOTEARS-MLP causal learning methods to construct directed acyclic graphs (DAGs) representing causal relationships among treatments, outcomes, and confounding variables.
The training process optimized both data reconstruction quality through distance minimization between real and synthetic data, and causal graph validity through augmented Lagrangian optimization ensuring acyclicity constraints. Learning rates started at 3×10⁻⁴ with cosine annealing schedules and warm restarts every 300 epochs. Virtual trials sampled n=232 patients per treatment arm matching LEAD-5 inclusion criteria, with baseline characteristics including age 57.6±9.5 years, BMI 30.4±5.3 kg/m², and HbA1c 67.2±7.5 mmol/mol. Treatment arms consisted of GLP-1 receptor agonists, basal insulin, and placebo (proton pump inhibitors or statins as inactive comparators), all combined with metformin and sulfonylurea background therapy. Pre-treatment features were collected from 9 months prior to intervention; post-treatment measurements were obtained over 12 months using median values. Difference-in-differences analysis quantified pairwise treatment comparisons with 95% confidence intervals obtained via bootstrapping over 60 random seeds.
Interventions
The study evaluated three treatment arms designed to emulate the LEAD-5 trial design. The primary intervention arm received GLP-1 receptor agonists (class-level modeling including liraglutide), which work by stimulating insulin secretion, suppressing glucagon release, slowing gastric emptying, and promoting satiety. The comparator arm received basal insulin therapy (specifically glargine in LEAD-5), a long-acting insulin analog providing 24-hour glucose control. The control arm received placebo, operationalized in this real-world data context as proton pump inhibitors (PPIs) or statins serving as inactive comparators that are commonly prescribed in type 2 diabetes populations but do not directly affect glycemic outcomes.
All three treatment arms were administered in combination with metformin and sulfonylurea background therapy, consistent with LEAD-5 protocols. The computational model treated interventions as drug class categories rather than specific medications to account for confounding effects of other prescribed drugs and to model global causal effects within each therapeutic class. Treatment assignment in virtual trials was randomized to eliminate confounding, mirroring RCT methodology. The model controlled for known confounders including patient demographics (age, sex), pre-treatment clinical measurements (baseline HbA1c, BMI, blood pressure, kidney function, lipid profiles), and treatment history through d-separation in the causal directed acyclic graph.
Key Findings
The virtual trial successfully replicated the LEAD-5 treatment hierarchy, demonstrating that GLP-1 receptor agonists were superior to both basal insulin and placebo across all three outcomes measured. For HbA1c reduction, GLP-1 showed -1.21 mmol/mol (-0.11%) greater reduction compared to basal insulin (p<0.001) and -2.58 mmol/mol (-0.24%) greater reduction versus placebo (p<0.001). For body mass index, GLP-1 produced -0.79 kg/m² greater reduction than basal insulin and -0.61 kg/m² versus placebo (both p<0.001). For systolic blood pressure, GLP-1 resulted in -2.99 mmHg greater reduction compared to basal insulin and -2.38 mmHg versus placebo (both p<0.001). Although absolute effect sizes differed from the original LEAD-5 trial due to methodological differences (class-level drugs versus specific medications, BMI versus bodyweight measurements), the AI model correctly identified the relative ranking and directional effects for all comparisons. Counterfactual analyses extending predictions beyond LEAD-5 inclusion criteria revealed important treatment heterogeneity: GLP-1 was not uniformly the most effective intervention across all real-world patient phenotypes, with some patient subgroups benefiting more from basal insulin or even placebo comparators, highlighting the importance of individualized treatment selection based on patient characteristics.
Comparison with other Studies
The virtual trial emulation results align directionally with the established LEAD-5 trial findings and broader GLP-1 literature. The original LEAD-5 study by Russell-Jones et al. (2009) demonstrated liraglutide superiority over insulin glargine for HbA1c reduction (-2.62 mmol/mol difference, p=0.0015) and bodyweight reduction (-3.43 kg difference, p<0.0001), which this AI model successfully replicated in treatment ranking despite smaller absolute effect sizes. The LEADER trial (Marso et al., 2016) established cardiovascular benefits of liraglutide beyond glycemic control, consistent with this study's finding of superior systolic blood pressure reduction with GLP-1 versus basal insulin.
This study's novel contribution lies in its computational methodology rather than clinical findings. Previous target trial emulation studies using observational data (Hernán & Robins, 2016) have relied on statistical adjustment methods, whereas this approach employs generative AI and causal learning to create virtual patient populations. The counterfactual analyses revealing treatment heterogeneity across patient phenotypes aligns with growing evidence on the importance of precision medicine approaches in diabetes management (Nowakowska et al., 2019). The study's demonstration that RCT results may not extrapolate uniformly to broader real-world populations supports calls for more inclusive trial designs and complementary real-world evidence generation. As regulatory agencies (FDA, EMA) increasingly recognize AI-generated evidence for decision-making, this framework offers a scalable approach to post-market surveillance and health technology assessment.
Journal Reference
MacLellan CR, Petkov H, McKeag C, et al. Emulating real-world GLP-1 efficacy in type 2 diabetes through causal learning and virtual patients. PLOS Digit Health. 2025;4(7):e0000927. doi:10.1371/journal.pdig.0000927
© 2026 deDiabetes. Licensed under CC BY (Attribution)
Related and Discussions
Liraglutide vs insulin glargine and placebo in combination with metformin and sulfonylurea therapy in type 2 diabetes mellitus (LEAD-5 met+SU)
Stay informed. Stay ahead.
Subscribe now for the latest breakthroughs, expert insights, and cutting-edge updates in diabetes care—delivered straight to your inbox.