Real-World Data (RWD), from which Real-World Evidence (RWE) is generated, has the unique capability of depicting real-world outcomes. RWD can also reduce timelines for research and development, and generate profound insights into the disease process. However, RWD from a single source often suffers from bias relating to equipment, lack of phenotypic diversity, limited training models, and diverse cohorts. RWD are also scattered and structured in diverse formats, which makes it difficult to unlock its full value. Furthermore, health data are personal, highly sensitive, and subject to data privacy rights and regulations.[1] As a result of these barriers, new methods are needed to enable unlocking the full potential of RWD, and Federated Data Networks (FDNs) are one such attempt.

FDNs are a string of decentralized, interrelated nodes that allow data to be challenged and analysed by the other nodes in the network, without the data leaving the parent node.[1] The member nodes involved in FDNs are governed by a common framework which allows harmonized standards and tools for data access. Each member node is semi-independent as they can make decisions on ceding data access. Since the shared data are masked, blocked or anonymized, the member nodes have limited idea on the identity of the data contained in the other nodes; as a result, data ownership is maintained. The algorithms are trained collaboratively without data exchange by study models called federated learning. Thus, FDNs provide safe data mining with regulated access to diverse data, without crossing the legal barriers.

In contrast to data sharing, data transfer, or data pooling, FDN is simply data visiting, and applying or modifying the results on the existing guidelines and practice.[1] To illustrate, consider two persons in a phone call: here, ideas are shared without sharing their identities, billing address etc. Similarly, FDNs involve sharing of only mathematical values and metadata sets, without sharing confidential patient identity. An example of FDN is the TIES (Text Information Extraction System) Cancer Research Network (TCRN). This FDN has 4 active nodes that search across 5.8 million cases and 2.5 million patients assess cohorts with rare phenotypes.[2]

FDNs can create huge impact on the stakeholders such as physicians, hospitals, insurance companies, researchers and patients. With the surge in digital health devices, the federated model assures good training options for the physicians and hospitals. FDNs are useful in disease classification, mortality forecasting, and predicting treatment outcomes. Proven applications of FDNs include prognosis of stroke prevention, improving patient pathways in cancer, coronary artery disease, classification of EEG recordings, brain tumor classification, breast density classification, multi-disease chest X-ray classification, adverse drug reaction prediction, recognition of human activity and emotion, and prediction of oxygen requirements in COVID patients.[3,4] A federated approach using diverse datasets from different institutions had a 98.3% accuracy in COVID-19 detection, 95.4% accuracy in recognizing human activity and emotions, and 97.7% accuracy in mortality prediction.[3]

In clinical research, FDNs serve to potentiate protocol optimization, patient selection, and adverse effect monitoring. It also facilitates translational research. Federated approach paves way to research on rare disease where the incidence rates and data sets are very low. FDNs reduce the time and resources by identifying the target patients rapidly for recruitment in clinical trials. Also, FDNs aid disease surveillance process by pooling data from different geographies.[5] For the manufacturers, FDNs facilitate continuous product validation and improvement.

To ensure data privacy, FDNs work in line with various regulatory provisions such as General Data Protection Regulation (GDPR) of Europe, Data Protection Act (DPA) of UK, Health Insurance Portability and Accountability Act (HIPAA) of the United States, California Consumer Privacy Act (CCPA), CDSCO of India, Personal Information Protection Law (PIPL), Cybersecurity Law (CSL), and Data Security Law (DSL) of China and the On Personal Data (OPD) of Russia.[6]

Despite their utility in data networking, FDNs face certain challenges in creating robust RWE. Insufficient and inconsistent data, uneven data quality, bias, and lack of data standardization are some of the factors that can lead to inconsistent conclusions. Cloud-based data, which is crucial to develop FDNs, are not enabled by all health care institutions. Practically speaking, debugging and optimizing FDNs is strenuous, because the hardware and networking differs on various sites which makes the learning algorithms diverse.[7] Another important challenge is the discrepancy in research grant funding. Larger hospitals may contribute to more datasets and may expect more research grant funds. However funding should be more towards the value of these contributions than to the size of datasets. That said, the main problem is in the accurate scaling of the value of these contributions.[7]

The success of FDNs lies with the strong and consistent governance coupled with open lines of communication among partners. Also, an approach involving incentives can boost the quality and quantity of data contributions among the member nodes. With these steps, FDNs can significantly increase the external validity and enhance the robustness and quality of RWD and the resulting RWE, without the need to centralize datasets, thereby realizing the promise of precision medicine.

Become A Certified HEOR Professional – Enrol yourself here!


  1. Hallock H et al. Federated networks for distributed analysis of health data. Frontiers in Public Health. 2021;9.
  2. Jacobson R et al. A federated network for translational cancer research using clinical data and biospecimens. Cancer Research. 2015;75(24):5194-5201.
  3. Prayitno et al. A systematic review of federated learning in the healthcare area: from the perspective of data properties and applications. Applied Sciences. 2021;11(23):11191.
  4. Joshi M et al. Federated learning for healthcare domain – pipeline, applications and challenges. ACM Transactions on Computing for Healthcare. 2022.
  5. Au F. Aggregated data or federated data: is one better than the other?
  6. The best of both worlds: benefits of applying AI/ML in a federated data network.
  7. Ng D et al. Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. Quantitative Imaging in Medicine and Surgery. 2021;11(2):852-857.


Related Posts