
Artificial intelligence (AI) is increasingly governing modern day healthcare. (1, 2) One of the most common AI applications can be seen in the use of large language model-driven chatbots to offer health-related advice.(3) From helping clinicians with decisions on screening to answering questions of patients on treatment and prevention, these chatbots are increasingly becoming a topic of interest. Yet the sudden increase in studies evaluating chatbot health advice underscores a critical problem. The way in which these studies are performed and reported is often incoherent, making the interpretation of results difficult, thus possibly jeopardizing patient safety.(4)
In the year after the release of ChatGPT in late 2022, over 130 studies were published assessing chatbot health advice, but many failed to include even basic details like the chatbot model version, how prompts were created, or what standards were implemented to evaluate the quality of responses. Without such transparency, the validity of results becomes questionable, thus increasing the risk of misinterpretation or harm.(5) Identifying this concern, a group of global experts across disciplines including medicine, AI, methodology, ethics, and publishing came together to create the Chatbot Assessment Reporting Tool (CHART), a clear guidance on the expectations for systematic reporting of these studies.(4)
The CHART statement was developed through a careful systematic review of existing studies to determine gaps, a global Delphi consensus process including over 500 stakeholders, panel discussions with about 50 experts, and pilot testing to ensure adaptability. This collaborative effort resulted in a 12-item checklist, with 39 subitems, developed to standardize the reporting of chatbot health advice studies. These items encompass every stage of a research report, from stating clearly in the title and abstract that the study is evaluating chatbot health advice, to describing the model version and its accessibility, to elucidating how prompts were obtained and what approaches were used to question the chatbot. They also highlight the need to define performance standards, elaborate assessment methods, and present results clearly, including variations from established medical evidence or the possible harmful or biased responses. Moreover, items related to open science are equally important, including disclosure of conflicts of interest, sources of funding, ethical approval, safety measures for patient data, and whether datasets and code are accessible for verification.(4)
By supporting inclusive reporting, the CHART statement seeks to obtain trust and consistency in a fast evolving area of medical research. Transparent methods enable other researchers to reproduce findings, give clinicians and policymakers confidence in the generalizability of chatbot-generated advice, and provide journal editors and reviewers with a standardized framework for assessing overall study quality. Just as existing reporting guidelines like CONSORT for randomized controlled trials (RCTs) and STROBE for observational studies enhanced the quality of health research, CHART is expected to advance the standards of this new domain.(4)
Notably, CHART has been created as a living guideline. Given the rapidly advancing nature of generative AI, with constantly emerging multimodal models and fine-tuned systems, the checklist will be regularly updated to maintain its relevance and robustness. Extensions are also scheduled to adapt the guideline for various study designs, such as RCTs or longitudinal cohort studies that involve chatbot interventions.(4)
Finally, the CHART statement is more than a checklist; it is a guidance for responsibly incorporating generative AI into healthcare research. By promoting transparency, methodological precision, and accountability, it helps ensure the scientific rigor of chatbot health advice studies along with their safety, reproducibility, and importance for patients, clinicians, and the wider public.
Become A Certified HEOR Professional – Enrol yourself here!
References
- Kolbinger FR, Veldhuizen GP, Zhu J, et al. Reporting guidelines in medical artificial intelligence: a systematic review and meta-analysis. Commun Med. 2024; 4:1.
- Han R, Acosta JN, Shakeri Z, et al. Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Lancet Digit Health. 2024; 6:e367–73.
- Huo B, Cacciamani GE, Collins GS, et al. Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med. 2023; 29:2988.
- Huo B, Collins G, Chartash D, et al. and CHART Collaborative. Reporting guideline for Chatbot Health Advice studies: the CHART statement. BMC Med. 2025; 23(1):447.
- Huo B, Boyle A, Marfo N, et al. Large language models for chatbot health advice studies: a systematic review. JAMA Netw Open. 2025; 8:e2457879.










Artificial Intelligence (AI) refers to a computerized system that performs physical tasks, cognitive functions, solves problems, and/ or makes decisions without overt human instructions.[1] First proposed by McCarthy in 1955, the concept of AI has been applied in many health-related areas, including clinical research, hospital care, drug development, disease diagnosis, prognosis, and treatment monitoring. Advancement in the field of research has led to low-cost computational resources leading to digitalization of healthcare, innovation in daily routine examination, and improving overall quality of treatment. It has become an important element of medical diagnosis, for example, in the assessment skin lesions, detection of diabetic retinopathy, interpretation of chest X-rays, etc. In addition, AI has great value in aiding clinicians to improve quality and safety of healthcare delivery.[2]
The 90’s saw the Internet and the World Wide Web entering commercial markets as a result of major advancements in Information Technology (IT). Another huge development was later followed when mobile devices connected to the Internet became a rage in late 2000’s. Today, we’re in the middle of the next major leap, i.e. the next generation of intelligent (IT). (1)
Artificial intelligence (AI) has no universally agreed definition. It roughly includes computing technologies that are similar to processes related to human intelligence; for e.g. reasoning, learning and adaptation, sensory understanding, and interaction. (1,2) At present, most AI applications are limited, since they can only perform specific tasks or solve pre-defined problems. (3) Recently, AI has been gaining significance in the field of healthcare. The AI industry is estimated to be worth $6 billion dollars by the year 2021. (4) A recent McKinsey review has projected that healthcare would be one of the top 5 industries to involve AI. (5)