
Artificial intelligence (AI) is increasingly governing modern day healthcare. (1, 2) One of the most common AI applications can be seen in the use of large language model-driven chatbots to offer health-related advice.(3) From helping clinicians with decisions on screening to answering questions of patients on treatment and prevention, these chatbots are increasingly becoming a topic of interest. Yet the sudden increase in studies evaluating chatbot health advice underscores a critical problem. The way in which these studies are performed and reported is often incoherent, making the interpretation of results difficult, thus possibly jeopardizing patient safety.(4)
In the year after the release of ChatGPT in late 2022, over 130 studies were published assessing chatbot health advice, but many failed to include even basic details like the chatbot model version, how prompts were created, or what standards were implemented to evaluate the quality of responses. Without such transparency, the validity of results becomes questionable, thus increasing the risk of misinterpretation or harm.(5) Identifying this concern, a group of global experts across disciplines including medicine, AI, methodology, ethics, and publishing came together to create the Chatbot Assessment Reporting Tool (CHART), a clear guidance on the expectations for systematic reporting of these studies.(4)
The CHART statement was developed through a careful systematic review of existing studies to determine gaps, a global Delphi consensus process including over 500 stakeholders, panel discussions with about 50 experts, and pilot testing to ensure adaptability. This collaborative effort resulted in a 12-item checklist, with 39 subitems, developed to standardize the reporting of chatbot health advice studies. These items encompass every stage of a research report, from stating clearly in the title and abstract that the study is evaluating chatbot health advice, to describing the model version and its accessibility, to elucidating how prompts were obtained and what approaches were used to question the chatbot. They also highlight the need to define performance standards, elaborate assessment methods, and present results clearly, including variations from established medical evidence or the possible harmful or biased responses. Moreover, items related to open science are equally important, including disclosure of conflicts of interest, sources of funding, ethical approval, safety measures for patient data, and whether datasets and code are accessible for verification.(4)
By supporting inclusive reporting, the CHART statement seeks to obtain trust and consistency in a fast evolving area of medical research. Transparent methods enable other researchers to reproduce findings, give clinicians and policymakers confidence in the generalizability of chatbot-generated advice, and provide journal editors and reviewers with a standardized framework for assessing overall study quality. Just as existing reporting guidelines like CONSORT for randomized controlled trials (RCTs) and STROBE for observational studies enhanced the quality of health research, CHART is expected to advance the standards of this new domain.(4)
Notably, CHART has been created as a living guideline. Given the rapidly advancing nature of generative AI, with constantly emerging multimodal models and fine-tuned systems, the checklist will be regularly updated to maintain its relevance and robustness. Extensions are also scheduled to adapt the guideline for various study designs, such as RCTs or longitudinal cohort studies that involve chatbot interventions.(4)
Finally, the CHART statement is more than a checklist; it is a guidance for responsibly incorporating generative AI into healthcare research. By promoting transparency, methodological precision, and accountability, it helps ensure the scientific rigor of chatbot health advice studies along with their safety, reproducibility, and importance for patients, clinicians, and the wider public.
Become A Certified HEOR Professional – Enrol yourself here!
References
- Kolbinger FR, Veldhuizen GP, Zhu J, et al. Reporting guidelines in medical artificial intelligence: a systematic review and meta-analysis. Commun Med. 2024; 4:1.
- Han R, Acosta JN, Shakeri Z, et al. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Lancet Digit Health. 2024; 6:e367–73.
- Huo B, Cacciamani GE, Collins GS, et al. Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med. 2023; 29:2988.
- Huo B, Collins G, Chartash D, et al. and CHART Collaborative. Reporting guideline for Chatbot Health Advice studies: the CHART statement. BMC Med. 2025; 23(1):447.
- Huo B, Boyle A, Marfo N, et al. Large language models for chatbot health advice studies: a systematic review. JAMA Netw Open. 2025; 8:e2457879.

