• Certainty of Evidence in Systematic Reviews: The GRADE Approach

    Certainty of Evidence in Systematic Reviews: The GRADE Approach

    When seeking answers to clinical questions about efficacy or other aspects of a proposed intervention, health professionals often look for reliable evidence. Systematic literature reviews (SLRs) serve this purpose, by virtue of their ability to generate high-quality, verifiable, and trustworthy evidence in a systematic, transparent, and impartial manner.(1) In the new evidence pyramid, systematic reviews are visualized as the lens through which certainty can be measured in all other primary research reported in multiple study designs.(1,2) The SLR process is most often used by reputed professional organizations worldwide for generating practice guidelines for diagnosis and management of different disorders.(1) However, since the primary research used for generating such guidelines are often of variable quality, the resulting guidelines may also be of a non-uniform quality. Therefore, it becomes essential to quantify and grade the quality of evidence and strength of the recommendations.(1,2)

    Out of the different available frameworks for doing this, perhaps the most frequently used system is the GRADE (Grading quality of evidence and strength of recommendations) framework. The GRADE approach starts with formulating the research question into appropriate PICOS headings, and selecting the most relevant outcomes for research. After performing an SLR gathering evidence about each of the selected outcomes, the certainty (alternatively called level or quality) of the evidence so gathered is graded based on various factors under four headings:(1–5)

    1. High: Authors are confident that the true effect lies closer to that of the estimate of the effect.
    2. Moderate: the true effect is probably close to the estimated effect
    3. Low: The true effect is probably substantially different from the estimated effect
    4. Very Low: The true effect is likely to be substantially different from the estimated effect

    To begin with, certainty of evidence is considered to be higher from RCTs than observational studies. Other factors which impact evidence certainty include:

    1. Risk of bias in individual studies: this occurs when the results of a study do not represent the truth because of inherent limitations in the design or conduct of a study. Examples for this would be methodological issues such as inadequate randomization, lack of blinding, confounding, loss to follow-up, and other such factors.(2)
    2. Consistency of results between studies: When multiple studies show consistent effects, overlap of confidence intervals, and low levels of heterogeneity, the resulting quality of evidence will be high. Consistency between studies is measured using heterogeneity of point estimates, statistical measures such as I2 values, and confidence intervals (CIs).(2,4)
    3. Indirectness of evidence: When the population of interest for which the recommendations are being prepared is different from the population found in the included studies (for example, if the study participants are adults, but the recommendations are being prepared for children), the certainty of evidence will be lower.(2)
    4. Imprecision: Explained as the ‘range of plausible effect sizes’, imprecision depends on the number of included patients/ events and the confidence interval. If the confidence interval of the plausible effect is too wide in a manner hindering a decision, then there is imprecision.(2)
    5. Publication Bias: It is common knowledge that small studies with no statistically significant results are less likely to be published; source of funding also contributes to publication bias. The resulting synthesized evidence is skewed towards published results.(2)

    Evidence that is initially rated as ‘high’ may be downgraded after consideration of the aforementioned criteria; the opposite is also possible.(3)

    At the end of the grading process, all outcomes of interest will be associated with a certain level of certainty. Based on the evidence so collected, recommendations are often drawn by a guideline panel. Various considerations that go into this process include balancing of benefits and risks, certainty of evidence, values and preferences, costs, feasibility, acceptability, and equity.(1) The resulting practice recommendations are assigned a ‘strength ranking’ under the GRADE approach: the recommendation can be ‘strong’ or ‘conditional’, and ‘for’ or ‘against’ a specific action in a specific situation. Strong recommendations suggest that most, if not all, people would choose this intervention. Weak recommendations imply that there is significant variation likely to be made by investigators in the decision.(2,4)

    It is important to acknowledge that using GRADE will commonly involve some subjective judgments, and assessments may vary between individuals. Despite this, the GRADE approach has proven to be an essential component of all clinical practice guidelines resulting from high-quality SLRs, since it provides a systematic, explicit, and transparent approach for grading the certainty of evidence and quality of practice recommendations.

    Become A Certified HEOR Professional – Enrol yourself here!

    References

    1. Granholm A, Alhazzani W, Møller MH. Use of the GRADE approach in systematic reviews and guidelines. Br J Anaesth. 2019 Nov;123(5):554–9.
    2. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008 Apr 26;336(7650):924–6.
    3. Goldet G, Howick J. Understanding GRADE: an introduction. J Evid Based Med. 2013 Feb 28;6(1):50–4.
    4. Kirmayr M, Quilodrán C, Valente B, Loezar C, Garegnani L, Franco JVA. The GRADE approach, Part 1: how to assess the certainty of the evidence. Medwave. 2021 Mar 31;21(02):e8109–e8109.
    5. Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011 Apr;64(4):401–6.
  • Text Mining for Search Term Development Aiding in Conduct of Better SLRs

    Text Mining for Search Term Development Aiding in Conduct of Better SLRs

    Systematic literature reviews (SLRs) are widely used to pool and present the findings from multiple studies in a dependable way and are often used to inform policy and practice guidelines. (1) An important SLR feature is the application of scientific tools to find and curtail bias as well as error in the selection and treatment of studies. (2) However, the increasing number of published studies together with the rate of their publication makes it even more complicated and time-consuming to identify relevant studies in an unbiased way. (3)

    To reduce the impact of publication bias, reviewers usually try identifying all relevant research to include it in SLRs. This is challenging and laborious, but the challenge is growing due to increasing databases to search as well as the number of papers and journals being published. Furthermore, evidence suggests the existence of an inherent North American bias in several major bibliographic databases (e.g. PubMed). Therefore, a range of other smaller databases needs to be looked into to identify research for reviews aiming at maximising external validity. (4) This then requires a multi-layered approach to searching through extensive Boolean searches from electronic bibliographic databases and specialised registers and websites. (5)

    Unfortunately, sensitive electronic searches of bibliographic databases show low specificity. Consequently, reviewers often end up manually looking through many thousands of irrelevant titles and abstracts for identifying the much smaller number of relevant ones; which is known as ‘screening’. (6) Roughly, an experienced reviewer can take between 30 seconds and a few minutes to evaluate a citation, which is why 10,000 citations involved in the screening process is considerable. (7) On the other hand, reviews for informed policy and practice must be completed within timetables (often short) and limited budgets; also, this review must be comprehensive in order to be an accurate reflection of the state of knowledge in a given area.(5)

    Text mining has been suggested as a prospective solution to these practical issues, as automating some of the screening process can prove to be time-saving. (5)  Text mining is defined as, ‘the process of discovering knowledge and structure from unstructured data (i.e., text)’. (8,9) There are two particularly promising ways in which text mining can be used to support the screening in SLRs, viz. i) by prioritising the list of items for manual screening to include most likely to be relevant studies can be included at the top of the list; and ii) by manually assigning include/exclude categories of studies for further application of such categorisations automatically. (10) The prioritisation of relevant items may not lessen the workload, but identifying most of the relevant studies first can enable other members of the team to proceed with the next stages of the review, whilst the rest of the irrelevant citations are screened by others. This reduces the turnaround time, even if the total workload may not really reduce.(5)

    The benefits of text mining in case of SLRs cannot be denied when it comes to developing database search strings for topics described by diverse terminology. Stansfield et al. have recently suggested five ways in which the text mining tools can aid in developing the search strategy: (11,12)

    • Improving the precision of searches – Framing more precise phrases instead of single-word terms
    • Identifying search terms to improve search sensitivity – Using additional search terms
    • Aiding the translation of search strategies across databases
    • Searching and screening within an integrated system
    • Developing objectively derived search strategies

    The utility of these tools depends on their different competencies, the way they are used, and the text analysed. (11)

    Moreover, Li et al. have recently proposed a text mining framework to reduce the abstract screening burden as well as to provide high-level information summary while conducting SLRs. This framework includes three self-defined semantics-based ranking metrics with keyword, indexed-term and topic relevance. This framework has been reported to reduce the labour of SLRs to a large degree, while keeping comparably higher recall. (13)

    An array of different issues concerning text mining makes it difficult to identify a single, most effective approach for its use in SLRs. There are, however, key messages/toolsets for applying text mining to the SLR context. Future research in this area should aim at addressing the duplication of evaluations as well as the feasibility of the toolsets for use across a range of subject-matter areas.(5)

    Become a Certified HEOR Professional – Enrol yourself here!

    References 

    1. Gough D, Oliver S, Thomas J. An Introduction to Systematic Reviews. London: Sage; 2012.
    2. Gough D, Thomas J, Oliver S. Clarifying differences between review designs and methods. Syst Rev 2012; 1(28).
    3. Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med 2010; 7(9).
    4. Gomersall A, Cooper C. Joint Colloquium of the Cochrane and Campbell Collaborations. Keystone, Colorado: The Campbell Collaboration; 2010. Database selection bias and its effect on systematic reviews: a United Kingdom perspective.
    5. O’Mara-Eves A, Thomas J, McNaught J, et al. Using text mining for study identification in systematic reviews: a systematic review of current approaches Syst Rev 2015; 4(1):5.
    6. Lefebvre C, Manheimer E, Glanville J. Searching for studies (chapter 6) In: Higgins J, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions Version 510 [updated March 2011] Oxford: The Cochrane Collaboration; 2011.
    7. Allen I, Olkin I. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA 1999; 282(7):634-5.
    8. Ananiadou S, McNaught J. Text Mining for Biology and Biomedicine. Boston/London: Artech House; 2006.
    9. Hearst M. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999): 1999. 1999. Untangling Text Data Mining; pp. 3–10.
    10. Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synth Methods 2011; 2(1):1–14.
    11. Stansfield C, O’Mara-Eves A, Thomas J. Text mining for search term development in systematic reviewing: A discussion of some methods and challenges. Res Synth Methods 2017; 8(3):355-365.
    12. Gore G. Text mining for searching and screening the literature. McGill. April, 2019. 
    13. Li D, Wang Z, Wang L, et al. A Text-Mining Framework for Supporting Systematic Reviews. Am J Inf Manag 2016; 1(1):1-9.

    Written by: Ms. Tanvi Laghate

  • How to Increase the Data Extraction Quality of Systematic Literature Review?

    How to Increase the Data Extraction Quality of Systematic Literature Review?

    Systematic literature reviews (SLRs) are the foundation of evidence-based healthcare. Explicit methods need to be implemented while conducting SLRs to minimize bias in order to provide more reliable findings, since reduction of bias may affect all steps of the review process. For instance, bias can occur while identifying/screening studies, selecting studies (e.g. due to unclear inclusion criteria), during data extraction process and also, during the validity assessment of included studies. (1,2)

    Data extraction or data collection is a critical step while carrying out SLRs. The process of data extraction can be defined as extracting any type of data from primary studies into any form of standardized tables.(2) Data extraction is one of the most time-consuming and critical tasks performed in order to validate the results of an SLR. (3) In reality, it typically takes between 2.5 to 6.5 years for a primary study publication to be included and published in a new SLR. (4) Moreover, almost 23 % such studies are out of date within 2 years of the publication of SLRs, because of lack of new evidence that might change the primary results of an SLR. (5)

    Evidence from literature further reports high prevalence of extraction errors, which may have only moderate impact on the results of an SLR.(2,6,7)  However, increasing data extraction errors indicate the importance of measures for quality assurance of data extraction in order to minimize the risk of biased results and wrong conclusions. (8) Therefore, there is a dire need of identifying ways to improve the quality of data extraction for SLRs.

    One of the first options is the use of two independent reviewers to extract the data, i.e. a process known as ‘double data extraction’. This process has been reported to result in fewer extraction errors; (9) however, it may not be always necessary, thus justifying the process of ‘reduced extraction’. Reduced extraction focuses on identification of critical aspects (e.g. primary outcomes) that form the basis of conclusion instead of emphasizing on the data extraction of lesser important parameters (e.g. patient characteristics, additional outcomes, etc.).2 This is also recommended by the Methodological Expectations of Cochrane Intervention Reviews (MECIR), wherein it is stated that “dual data extraction is particularly important for outcome data, which feed directly into syntheses of the evidence, and hence to the conclusions of the review”. (10) In addition, the Institute of Medicine (IOM) also states that “at minimum, use two or more researchers, working independently, to extract quantitative and other critical data from each study”. (11)

    Moreover, training the reviewer team in data extraction (e.g. using a sample) prior to performing the complete data extraction is essential to harmonize the end results as well as to clear up common misunderstandings, which would particularly reduce interpretation and selection errors as well as time and effort.6 The reduction of time and effort is especially useful in case of rapid reviews, which aim to deliver timely yet systematic results. (12)

    Automated data extraction has been recently proposed in order to reduce errors as well as for timely completion of SLRs. Natural language processing (NLP) is one such method that involves computerized data extraction to include new, previously unfound information by automatically extracting data from different written resources. (13) This process constitutes aspects of concept extraction/entity recognition, and relation extraction/association extraction. The technique of NLP has been used to automate extraction of genomic and clinical information from biomedical literature. However, the concept of automating data extraction process has not been explored completely yet. The techniques like NLP can initially be used to monitor manual data extraction (which is currently performed in duplicate); then to validate the same done by a single reviewer; then become the primary source for data element extraction to be validated by a human; and eventually completely automate data extraction to enable efficient and faster SLRs. (14)

    Having said that, there are no specific, established standards for data extraction, because the actual benefit of a certain extraction method (e.g. independent data extraction) or the specifications of the reviewer team (e.g. expertise) is not well proven. This warrants more comparative studies to further understand the influence of different extraction methods. Particularly, studies exploring the need of training for data extraction are vital owing to the lack of such analysis till date. More efficient utilization of scientific expertise can be achieved with the application of methods requiring less effort without threatening the internal validity. Finally, enhancing the knowledge base would also help in planning effective training strategies for new reviewers and students in the future.(2)

    Become a Certified HEOR Professional – Enrol yourself here!

    References

    1. Felson DT. Bias in meta-analytic research. J Clin Epidemiol 1992; 45(8):88-892. 
    2. Mathes T, Klaßen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: A methodological review. BMC Med Res Methodol 2017; 17(1):152.
    3. Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions version 5.1.0 [updated March 2011]. The Cochrane Collaboration; 2011.
    4. Elliott J, Turner T, Clavisi O, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med 2014; 11:e1001603.
    5. Shojania KG, Sampson M, Ansari MT, et al. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med 2007; 147(4):224-33.
    6. Haywood KL, Hargreaves J, White R, et al. Reviewing measures of outcome: reliability of data extraction. J Eval Clin Pract 2004; 10:329–337.
    7. Carroll C, Scope A, Kaltenthaler E. A case study of binary outcome data extraction across three systematic reviews of hip arthroplasty: errors and differences of selection. BMC research notes 2013; 6:539.
    8. Gøtzsche PC, Hróbjartsson A, Marić K, et al. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 2007; 298(4):430–437. 
    9. Tendal B, Higgins JP, Juni P, et al. Disagreements in meta-analyses using outcomes measured on continuous or rating scales: observer agreement study. BMJ 2009; 339: b3128. 
    10. Higgins JPT, Lasserson T, Chandler J, et al. Methodological Expectations of Cochrane Intervention Reviews. London: Cochrane; 2016. 
    11. Morton S, Berg A, Levit L, Eden J. Finding what works in health care: standards for systematic reviews. National Academies Press; 2011.
    12. Schünemann HJ, Moja L. Reviews: rapid! Rapid! Rapid! …and systematic. Syst Rev 2015; 4(1):4.
    13. Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. pp. 3–10. 
    14. Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev 2015; 4:78.

    Written by: Ms. Tanvi Laghate

  • What All You Need to Know About PROBAST?

    What All You Need to Know About PROBAST?

    Today’s era of risk based, precision and personalized medicine demands clinical prediction models. Prediction modelling studies focus on two kinds of outcomes, viz. diagnosis (probability of a condition that is undetected) and prognosis (probability of developing a certain outcome in the future). (1,2) These studies develop, validate, or update a multivariable prediction model, wherein multiple predictors are used in combination to estimate probabilities to inform and often guide individual care. Moreover, evidence from literature shows both prognostic as well as diagnostic models being widely used in various medical domains and settings, (3) such as cancer, (4) neurology, (5) and cardiovascular disease. (6) Increasingly common competing prediction models can exist for the same outcome or target population, which necessitate the systematic reviews of these prediction model studies; since their coexistence may facilitate misperceptions amongst health care providers, guideline developers, and policymakers about which model to use or recommend, and in which persons or settings. (1,7)

    Quality assessment is vital while conducting any systematic review, for which several tools are in place that enable the assessment of the risk of bias (ROB). (8) For example, the QUIPS (Quality In Prognosis Studies) tool evaluates the ROB in predictor finding (prognostic factor) studies. (9) Similarly, the revised Cochrane ROB Tool (ROB 2.0) (10) investigates the methodological quality of prediction model impact studies, that use a randomized comparative design, or ROBINS-I (Risk of Bias in Nonrandomized Studies of Interventions) for those incorporating a non-randomized comparative design. (11) Today, prediction model studies as well as their systematic reviews are often being used as evidence for clinical guidance and decision making, which warrants a tool that would facilitate quality assessment for individual prediction model studies. For this purpose, PROBAST (Prediction model Risk Of Bias ASsessment Tool) has been recently introduced. PROBAST came into existence owing to the lack of appropriate tool that would evaluate the ROB for systematic reviews of diagnostic and prognostic prediction model studies. (7,8,12)

    Bias is nothing but a systematic error in a study that leads to inaccurate results, thus inhibiting the study’s internal validity. (8) Similarly, inadequacies of the study design, conduct and analysis may often lead to the distorted estimates of model predictive performance, thus facilitating the ROB to occur. Moreover, different populations, predictors, or outcomes of the study than those specified in the review question may give rise to the concerns regarding the applicability of a primary study. PROBAST has been, therefore, developed to address the lack of suitable tools designed specifically to assess ROB and applicability of primary prediction model studies.

    Development of PROBAST:

    A 4-stage approach for developing health research reporting guidelines was implemented in developing PROBAST. This approach consisted of following stages: 1) defining the scope, 2) reviewing the evidence base, 3) using a Web-based Delphi procedure, and 4) refining the tool through piloting. (8,13) PROBAST was designed mainly to assess primary studies included in a systematic review and not predictor finding or prediction model impact studies. The steering group of 9 experts in prediction model studies and development of quality assessment tools agreed that PROBAST would assess both ROB as well as the concerns regarding applicability of a study evaluating a multivariable prediction model to be used for individualized diagnosis or prognosis. For the first stage, a domain-based structure was adopted to define the scope of PROBAST, similar to that used in other ROB tools, such as ROB 2.0, ROBINS-I, QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2), and ROBIS. A total of 3 approaches were used to build an evidence base as part of the second stage, wherein relevant methodological reviews were identified in the area of prediction model research, which was followed by identification of relevant methodological studies by members of the steering group, and lastly, additional evidence was identified with the help of applying the Delphi procedure in a wider group. All this evidence produced an initial list of signalling questions to consider for inclusion in PROBAST. In the third stage, a modified Delphi process, by means of web-based surveys, was used to gain structured feedback and agreement on the scope, structure, and content of PROBAST through 7 rounds. The 38-member Delphi group included methodological experts in prediction model research and development of quality assessment tools, experienced systematic reviewers, commissioners, and representatives of reimbursement agencies. The inclusion of various stakeholders ensured fair representation of the views of end users, methodological experts, and decision makers. In the fourth stage, the then-current version of PROBAST was piloted at multiple workshops at consecutive Cochrane Colloquia as well as numerous workshops with MSc and PhD students. The feedback received was used to further refine the content and structure of PROBAST, wording of the signalling questions, and content of the guidance documents. (7,8)

    PROBAST consists of 4 steps, viz. 1) specifying the systematic review question, 2) classifying the type of prediction model,  3) assessing ROB and applicability and 4) the overall judgement. PROBAST is the first comprehensively developed tool designed explicitly to assess the quality of prediction model studies for development, validation, or updating of both diagnostic and prognostic models, notwithstanding the medical domain, type of outcome, predictors, or statistical technique used. (7,8) PROBAST was introduced earlier this month in two parts; the first publication by Wolf et al.(8) highlights the development and scope of PROBAST, while the second publication by Moons et al. (8) explicitly describes the applications of PROBAST and how to judge ROB and applicability.

    Organizations that support decision making (such as the National Institute for Health and Care Excellence and the Institute for Quality and Efficiency in Health Care); researchers and/or clinicians interested in evidence-based medicine or involved in guideline development; and journal editors, manuscript reviewers are the potential users for PROBAST. (8)

    Become an Certified HEOR Professional – Enrol yourself here!

    References 

    1. Bouwmeester W, Zuithoff NP, Mallett S, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9:1-12.
    2. Steyerberg EW, Moons KG, van der Windt DA, et al; PROGRESS Group. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med 2013; 10:e1001381.
    3. Collins GS, Mallett S, Omar O, et al. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med 2011; 9:103.
    4. Altman DG. Prognostic models: a methodological framework and review of models for breast cancer. Cancer Invest 2009; 27:235-43.
    5. Counsell C, Dennis M. Systematic review of prognostic models in patients with acute stroke. Cerebrovasc Dis 2001; 12:159-70.
    6. Damen JA, Hooft L, Schuit E, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ 2016; 353:i2416.
    7. Moons KGM, Wolf RF, Riley RD, et al. PROBAST: A tool to assess risk of bias and applicability of prediction model studies: Explanation and elaboration. Ann Intern Med 2019; 170:W1-W33.
    8. Wolf RF, Moons KGM, Riley RD, et al; for the PROBAST Group. PROBAST: A Tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019; 170:51-58.
    9. Hayden JA, van der Windt DA, Cartwright JL, et al. Assessing bias in studies of prognostic factors. Ann Intern Med 2013; 158:280-6.
    10. Higgins JPT, Savovic´ J, Page MJ, et al. ROB2 Development Group. A revised tool for assessing risk of bias in randomized trials. In: Chandler J, McKenzie J, Boutron I, Welch V, eds. Cochrane Methods. London: Cochrane; 2018:1-69.
    11. Sterne JA, Herna´n MA, Reeves BC, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 2016; 355:i4919.
    12. PROBAST. Available at: http://www.probast.org/ABOUT
    13. Moher D, Schulz KF, Simera I, et al. Guidance for developers of health research reporting guidelines. PLoS Med 2010; 7:e1000217.

    Written by: Ms. Tanvi Laghate

  • How NMAs are Helping in Taking Informed Clinical Decisions?

    How NMAs are Helping in Taking Informed Clinical Decisions?

    Network meta-analysis (NMA) is a type of meta-analysis that adds an additional variable to a meta-analysis, and instead of a simple summation of trials that have evaluated the same treatment, several different treatments are compared by statistical inference.1 NMA is also referred to as mixed treatments comparison or multiple treatments comparison meta-analysis.2,3,4

    It was recognised at the National Institute for Clinical Excellence (NICE) that there is an increasing need for technology appraisals and clinical guidelines to be informed by integrated analyses, because of the lack of sufficient head to head comparisons of new treatments to inform clinical practice.5 Literature suggests that NMA is a feasible option to inform clinical practice decisions, particularly in cases where several treatments are examined.6,7

    NMA includes a combination of direct evidence within the trials and indirect evidence across the trials, thereby providing estimates of relative efficacy between all the relevant interventions, even in cases where there has never been a head to head comparison.1,2,3 In essence, the treatment effects are calculated for all treatments or interventions using all the available evidence in one simultaneous analysis. 6,8

    NMA relies on two main assumptions; homogeneity of compared trials and consistency in direct and indirect evidence.7,9 A simple example of a NMA would be as follows. A trial compares drug A to drug B and another trial, including the same target patient population, compares drug B to drug C.  Assuming that drug A is superior to drug B in the first trial, and assuming drug B is equivalent to drug C in a second trial, the NMA allows a potential inference that statistically drug A is also superior to drug C for this particular target population.1,4,8 Therefore; one can say that if drug A is more effective than drug B, and drug B is equivalent to drug C, then drug A is also more effective drug C. 1,4,7

    The main advantage of NMA over traditional or pairwise meta-analysis is that it enables some certainty about all treatment comparisons based on the strength of indirect evidence, and it further allows an estimation of the comparative effects, which would not have been examined in parallel group randomized clinical trials.2,4 Overall, NMA potentially enable an assessment of the benefits and harms for more than two interventions for the same clinical condition.

    In terms of limitations with NMA, this type of meta-analysis is more likely to be valid when analysing sufficiently homogenous studies that include very similar patient populations.  As NMA increases the number and type of studies being compared and combined, there is more likelihood of studies getting combined, which are heterogeneous.1,3,9  In addition, the various overlapping meta-analyses with heterogeneous findings may potentially confound the readers and decision makers. Further, NMA from a practical point of view is more complex than the conventional pair-wise meta-analysis, and requires more time and resources. The various assumptions underlying conventional pairwise meta-analyses are well researched and understood; however, the assumptions related to NMA are seen to be more complex, leading to misinterpretations.

    The methodological work to address the limitations of NMAs is an on-going work, and in light of this fact, researchers and end-users should be cautious when interpreting results from NMAs, as inappropriate combination of studies may result in overestimation of treatment effects and therefore misleading results, with some uncertainty in improving patient outcomes! Nevertheless, NMAs are seen as useful tools that are increasingly becoming attractive because they provide a comprehensive framework for decision-making.

    Become an Certified HEOR Professional – Enrol yourself here!

    References

    1. Cipriani A, Higgins JP, Geddes JR, Salanti G. Conceptual and technical challenges in network meta-analysis. Ann Intern Med. 2013 Jul 16; 159(2):130-7.
    2. Mills EJ, Thorlund K, Ioannidis JP. Demystifying trial networks and network meta-analysis. BMJ. 2013 May 14; 346: f2914.
    3. National Institute for Health and Care Excellence. Guide to the Methods of Technology Appraisal 2013 [Internet]. London: National Institute for Health and Care Excellence (NICE); 2013 Apr. Process and Methods Guides No. 9. NICE Process and Methods Guides. [Viewed on 02/08/2018]
    4. Li T, Puhan MA, Vedula SS, Singh S, Dickersin K; Ad Hoc Network Meta-analysis Methods Meeting Working Group. Network meta-analysis-highly attractive but more methodological research is needed. BMC Med. 2011 Jun 27; 9:79.
    5. Rawlins MD. In pursuit of quality: the National Institute for Clinical Excellence. Lancet. 1999; 353:1079–82.
    6. Caldwell DM, Ades AE, Higgins JP. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ. 2005; 331(7521):897–900.
    7. Tu YK, Faggion CM Jr. A primer on network meta-analysis for dental research. ISRN Dent. 2012; 2012:276520.
    8. Sutton A, Ades AE, Cooper N, Abrams K. Use of indirect and mixed treatment comparisons for technology assessment. Pharmacoeconomics. 2008; 26(9):753–767.
    9. Donegan S, Williamson P, D’Alessandro U, Tudur Smith C. Assessing key assumptions of network meta-analysis: a review of methods. Res Synth Methods. 2013 Dec; 4(4):291-323.

    Written By – Dr. Sandeep Moola (Research Fellow, The University of Adelaide, Australia)