
Systematic literature reviews (SLRs) play a pivotal role in evidence-based decision-making by synthesizing existing research to guide scientific practice. The meticulous process of exhaustive literature searches in SLRs often employs literature searching in multiple databases and other sources of information. Naturally, at the end of the literature searching activity, a sizeable number of records in the pool of initial potential hits might be duplicated, by being captured from different information sources. The identification and removal of such duplicate records is critical to ensure the credibility and transparency of an SLR. [1]
The inclusion of duplicate records in SLRs can have far-reaching consequences. One of the primary impacts is the inflation of the apparent number of studies. This can lead to a distorted view of the existing evidence, potentially affecting the overall conclusions drawn from the review. The reliability of findings is at stake when duplicate data is included, influencing the analysis towards certain studies or authors and compromising the impartiality of the review. The importance of thorough deduplication becomes evident not only for the sake of accuracy but also for maintaining efficiency and reducing bias in the systematic review process.[1] Thus, proper deduplication ensures that the conclusions drawn from the SLR are based on a fair representation of the available evidence, free from the distorting effects of redundant information.[2]
Deduplication strategies in systematic reviews encompass a spectrum of approaches, ranging from manual screening to the utilization of advanced automated tools. Reference management software, such as EndNote, Mendeley, or Zotero, offers built-in deduplication features based on metadata matching. However, these tools may overlook duplicates with minor variations or different publication formats. Automated deduplication tools like Rayyan, Covidence, or Systematic Review Assistant employ sophisticated algorithms that consider title and abstract similarity, full-text comparisons, and citation matching. While these tools enhance accuracy, manual screening remains indispensable, particularly for cases with significant variations in title or authorship.[3]
Effective deduplication involves the development of a clear and documented protocol outlining specific criteria and tools. Multiple search runs are often necessary to consolidate results, and regular calibration of automated tools against manual identification is crucial. Transparent documentation of the deduplication process in the review report is instrumental in enabling other researchers to replicate the review steps and contribute to the cumulative knowledge in the field.[1]
The evolution of deduplication methods reflects the continuous quest for more effective tools. From manual cataloguing in the early days of bibliographic record-keeping to sophisticated automated techniques, the landscape of deduplication has undergone significant changes. Reference management software introduced automated deduplication features, and the advent of Digital Object Identifiers (DOIs) in the late 1990s facilitated more accurate deduplication. Researchers have explored various methods and tools, each with its strengths and limitations. Recent advancements include the introduction of dedicated tools such as Deduklick, an artificial intelligence-based algorithm that combines natural language processing with expert-defined rules. The Bramer method, introduced in 2016, focuses on adapting page number formats to facilitate deduplication in EndNote. The Systematic Review Assistant-De-duplication Module (SRA-DM), developed in 2015, demonstrated superior sensitivity and specificity compared to EndNote’s default deduplication process. [4]
However, deduplication is not without its challenges. Non-standard citations, variations in database indexing, cross-language studies, and the existence of multiple versions of the same studies pose hurdles for both automated and manual deduplication. The variability in data, differences in citation formats, and the risk of exclusion due to overly aggressive deduplication demand careful consideration.[1]
In conclusion, deduplication is a critical step in maintaining the accuracy, reliability, and transparency of SLRs. The evolving landscape of tools and methodologies underscores the continuous commitment to refining and advancing deduplication practices. As technology continues to progress, collaboration across disciplines and ongoing research will shape the future of deduplication, paving the way for more effective and efficient processes in evidence synthesis. Researchers and reviewers must remain vigilant, adopting best practices and considering the challenges inherent in the deduplication process to uphold the highest standards of scientific rigor in SLRs.
Become A Certified HEOR Professional – Enrol yourself here!
References
- Hammer B, Virgili E, Bilotta F. Evidence-based literature review: De-duplication a cornerstone for quality. World Journal of Methodology. 2023 Dec 12;13(5):390.
- Kwon Y, Lemieux M, McTavish J, Wathen N. Identifying and removing duplicate records from systematic review searches. J Med Libr Assoc. 2015 Oct;103(4):184-8. doi: 10.3163/1536-5050.103.4.004. PMID: 26512216; PMCID: PMC4613377.
- Puljak L, Lund H. Definition, harms, and prevention of redundant systematic reviews. Systematic Reviews. 2023 Apr 4;12(1):63.
- Rathbone J, Carter M, Hoffmann T, Glasziou P. Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module. Systematic reviews. 2015 Dec;4:1-6.

