When seeking answers to clinical questions about efficacy or other aspects of a proposed intervention, health professionals often look for reliable evidence. Systematic literature reviews (SLRs) serve this purpose, by virtue of their ability to generate high-quality, verifiable, and trustworthy evidence in a systematic, transparent, and impartial manner.(1) In the new evidence pyramid, systematic reviews are visualized as the lens through which certainty can be measured in all other primary research reported in multiple study designs.(1,2) The SLR process is most often used by reputed professional organizations worldwide for generating practice guidelines for diagnosis and management of different disorders.(1) However, since the primary research used for generating such guidelines are often of variable quality, the resulting guidelines may also be of a non-uniform quality. Therefore, it becomes essential to quantify and grade the quality of evidence and strength of the recommendations.(1,2)

Out of the different available frameworks for doing this, perhaps the most frequently used system is the GRADE (Grading quality of evidence and strength of recommendations) framework. The GRADE approach starts with formulating the research question into appropriate PICOS headings, and selecting the most relevant outcomes for research. After performing an SLR gathering evidence about each of the selected outcomes, the certainty (alternatively called level or quality) of the evidence so gathered is graded based on various factors under four headings:(1–5)

  1. High: Authors are confident that the true effect lies closer to that of the estimate of the effect.
  2. Moderate: the true effect is probably close to the estimated effect
  3. Low: The true effect is probably substantially different from the estimated effect
  4. Very Low: The true effect is likely to be substantially different from the estimated effect

To begin with, certainty of evidence is considered to be higher from RCTs than observational studies. Other factors which impact evidence certainty include:

  1. Risk of bias in individual studies: this occurs when the results of a study do not represent the truth because of inherent limitations in the design or conduct of a study. Examples for this would be methodological issues such as inadequate randomization, lack of blinding, confounding, loss to follow-up, and other such factors.(2)
  2. Consistency of results between studies: When multiple studies show consistent effects, overlap of confidence intervals, and low levels of heterogeneity, the resulting quality of evidence will be high. Consistency between studies is measured using heterogeneity of point estimates, statistical measures such as I2 values, and confidence intervals (CIs).(2,4)
  3. Indirectness of evidence: When the population of interest for which the recommendations are being prepared is different from the population found in the included studies (for example, if the study participants are adults, but the recommendations are being prepared for children), the certainty of evidence will be lower.(2)
  4. Imprecision: Explained as the ‘range of plausible effect sizes’, imprecision depends on the number of included patients/ events and the confidence interval. If the confidence interval of the plausible effect is too wide in a manner hindering a decision, then there is imprecision.(2)
  5. Publication Bias: It is common knowledge that small studies with no statistically significant results are less likely to be published; source of funding also contributes to publication bias. The resulting synthesized evidence is skewed towards published results.(2)

Evidence that is initially rated as ‘high’ may be downgraded after consideration of the aforementioned criteria; the opposite is also possible.(3)

At the end of the grading process, all outcomes of interest will be associated with a certain level of certainty. Based on the evidence so collected, recommendations are often drawn by a guideline panel. Various considerations that go into this process include balancing of benefits and risks, certainty of evidence, values and preferences, costs, feasibility, acceptability, and equity.(1) The resulting practice recommendations are assigned a ‘strength ranking’ under the GRADE approach: the recommendation can be ‘strong’ or ‘conditional’, and ‘for’ or ‘against’ a specific action in a specific situation. Strong recommendations suggest that most, if not all, people would choose this intervention. Weak recommendations imply that there is significant variation likely to be made by investigators in the decision.(2,4)

It is important to acknowledge that using GRADE will commonly involve some subjective judgments, and assessments may vary between individuals. Despite this, the GRADE approach has proven to be an essential component of all clinical practice guidelines resulting from high-quality SLRs, since it provides a systematic, explicit, and transparent approach for grading the certainty of evidence and quality of practice recommendations.

Become A Certified HEOR Professional – Enrol yourself here!


  1. Granholm A, Alhazzani W, Møller MH. Use of the GRADE approach in systematic reviews and guidelines. Br J Anaesth. 2019 Nov;123(5):554–9.
  2. Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008 Apr 26;336(7650):924–6.
  3. Goldet G, Howick J. Understanding GRADE: an introduction. J Evid Based Med. 2013 Feb 28;6(1):50–4.
  4. Kirmayr M, Quilodrán C, Valente B, Loezar C, Garegnani L, Franco JVA. The GRADE approach, Part 1: how to assess the certainty of the evidence. Medwave. 2021 Mar 31;21(02):e8109–e8109.
  5. Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol. 2011 Apr;64(4):401–6.

Related Posts