
Real-world evidence (RWE) and real-world data (RWD) have been gaining prominence in the recent years, opening new avenues in clinical research, especially for settings where conventional randomized controlled trials (RCTs) are infeasible or ethically challenged. One such innovative step is the use of external control arms (ECAs), which consider retrospective patient data as comparators for single-arm clinical studies. This approach is particularly crucial in rare diseases, oncology, or therapeutic areas with high unmet need, where recruiting patients for a conventional control arm may be difficult or impossible. However, developing an effective and reliable ECA warrants careful matching of baseline characteristics, outcomes, and treatment pathways. These tasks are increasingly being supported and improved by the advent of artificial intelligence (AI).(1, 2)
AI is instrumental in enhancing the development of ECAs by enabling quicker and more efficient extraction of relevant data from huge, heterogeneous RWD sources, such as electronic health records (EHRs), registries, claims databases, and even clinical notes. Through techniques like natural language processing (NLP) and machine learning (ML) algorithms, AI tools can accurately homogenize different data points, attribute missing values, and recognize eligible patient populations. This facilitates the selected external cohort to closely resemble the treatment population in terms of demographics, disease features, and prognostic factors, minimizing bias and increasing the reliability of the comparison.(1-4)
Benefits of AI extend beyond just data selection by further facilitating vigorous statistical approaches in ECA development. Advanced ML approaches, such as propensity score matching, inverse probability treatment weighting, or even more adaptable models like generative adversarial networks (GANs), can be applied to calculate baseline covariates and replicate counterfactual outcomes.(5) These methods enable researchers to more precisely estimate treatment effects, improve causal inference, and minimize confounding, all of which are crucial elements to make ECA findings reliable to regulators, clinicians, and payers. Moreover, AI can help examine the validity of the ECA over time with evolving clinical practices, providing dynamic updates and adaptive practices.(2, 3, 5)
However, the incorporation of AI in ECA development has several limitations. Data quality and completeness continue to be major challenges, as even the most innovative algorithms cannot counteract analytically missing or biased data. The interpretability and transparency of AI models are crucial for ensuring regulatory acceptance and duplicability.(2, 4) To address these concerns, stakeholders are promoting the use of explainable AI (XAI) and standardized validation frameworks to improve the understanding of model outputs, making them reliable to both scientific and non-scientific audiences.(6)
As research progresses and explores the importance of ECAs in drug development, the combination of AI with RWD analytics is expected to transform evidence generation. By automating multifaceted analyses and discovering hidden patterns in large-scale datasets, AI is revolutionizing ECAs from a promising concept into a scalable, robust, and proficient solution that supports conventional clinical trial approaches. With the appropriate measures and collaboration across disciplines, AI-driven ECAs are immensely capable of expediting treatment access while maintaining scientific integrity and patient-centricity.
Become A Certified HEOR Professional – Enrol yourself here!
References
- Elvatun S, Knoors D, Brant S, et al. Synthetic data as external control arms in scarce single-arm clinical trials. PLOS Digit Health. 2025 Jan 23;4(1):e0000581.
- Pasculli G, Virgolin M, Myles P, et al. Synthetic Data in Healthcare and Drug Development: Definitions, Regulatory Frameworks, Issues. CPT Pharmacometrics Syst Pharmacol. 2025 May;14(5):840-852.
- American Statistical Association: AmstatNews: External Control Arms: Key Elements. June 2022. [Accessed online on 30th July 2025]. Available at: https://magazine.amstat.org/blog/2022/06/01/external-control-arms-key-elements/
- Singh, Ajit, Medical Data Imputation: Using Generative AI to Impute Missing Values in Medical Datasets (March 04, 2025). Available at SSRN: https://ssrn.com/abstract=5196881
- Kwak D, Liang Y, Shi X, et al. Comparing Machine Learning and Advanced Methods with Traditional Methods to Generate Weights in Inverse Probability of Treatment Weighting: The INFORM Study. Pragmat Obs Res. 2024 Oct 4;15:173-183.
S Ali, T Abuhmed, S El-Sappagh, et al. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence, Information Fusion. 2023; 99:101805.

