The concept of “Big Data” is #trending today, which is characterized by types of data sources with huge quantities, high speed and broad diversity of information. Healthcare industries are trying to apply Big Data analytics to reform data into a workable platform in order to generate information that would help making better and faster clinical decisions, such as reduced readmissions, scaling down hospital-associated illnesses, identifying and eliminating waste, improved clinician workflow etc. Government and the private sectors are taking in Big Data to enable better, quicker and more valuable care delivery to people. (1)
With rising discussions about Big Data, artificial intelligence, and related techniques in health care, the need for the appropriate and more importantly, ethical use of these methods is becoming increasingly relevant. (1) Privacy and confidentiality associate closely with each other. Data privacy talks about the rights of individuals to maintain control over their own health information; while confidentiality is the responsibility of entities handed over with those data to maintain privacy. (2) Concerns of data privacy and confidentiality hamper their scope, proper storage, accessibility, and propagation, particularly in case of highly sensitive or personal data. The ever expanding scope of data collection, storage and analysis (3,4), further add to the risk of data privacy infringements. (5,6) In addition, data anonymity does not ensure against individuals’ identity subsequently through the joining of data sets and re-identification, (7) data manipulation and discrimination, (8) or other inappropriate ways of data uses. (9) Therefore, protected management of patient data is necessary, since healthcare clouds link large amounts of data from disparate networks. (10)
There are several factors of privacy and security that must be taken into consideration while using Big Data analytics for healthcare. For instance, although it has the potential to provide an understanding on the huge volumes of heterogeneous data, challenges arise with respect to potential security and privacy breaches; which, as a result, hinder the process of appropriately accessing the value held within the data. (11)
Big Data platform must embrace multiple layers of security for data at rest and the data in flight. All communications between data sources, data consumers and the Big Data warehouse should be encrypted to provide security to the data. There are some methods that can be applied to ensure data security in Big Data analytics. A traditional method to prevent the confidential information disclosure by de-identifying, i.e. rejecting any information that can identify the patient, either by removing specific identifiers of the patient or by the second statistical method, where the patient verifies himself that enough identifiers are deleted. The traditional method can be enhanced with the help of concepts like k-anonymity, l-diversity and t-closeness. Moreover, hybrid execution model ensures confidentiality and privacy in cloud computing by utilizing public clouds only in case of non-sensitive data and computation classified as public; i.e., when the organization declares no privacy and confidentiality risk in exporting the data and performing computation on it using public clouds. While it uses private cloud in case of sensitive, private data and computation, some techniques do apply identity-based anonymization. However, due to increased complexity as well as several limitations, these models need to undergo further research and tests as they are getting more difficult to interpret and less reliable. (12)
Patient data security and privacy are crucial in driving the healthcare transformation. With Big Data in healthcare becoming more omnipresent with cloud computing, the host companies will be more reluctant to share massive healthcare data for centralized processing. Hence, distributed processing across different clouds and pulling up on cumulative intelligence is foreseen.
The extreme sensitivity of healthcare data makes their confidentiality and integrity crucial. Therefore, in healthcare, Big Data security is fundamental. Additionally, to provide the best care, healthcare providers must have quick, but secure, access to a patient’s medical history. Security solutions should ensure protecting analytics and securing Big Data frameworks. Laying out the right technical foundation is a precondition for successful data analysis.
- Balthazar P, et al. Protecting Your Patients’ Interests in the Era of Big Data, Artificial Intelligence, and Predictive Analytics. J Am Coll Radiol 2018; 15(3 Pt B):580-586.
- Centers for Disease Control and Prevention. Emergency preparedness for older adults; HIPAA, privacy and confidentiality. Available at:
- Mittelstadt BD, et al. The ethics of big data: current and foreseeable issues in biomedical contexts. Sci Eng Ethics 2016; 22:303-41.
- Nunan D, et al. Market research and the ethics of big data. Int J Mark Res 2013; 55:505-20.
- Andrejevic M. The big data divide. Int J Commun 2014;8:17.
- Puschmann C, Burgess J. Metaphors of big data. Int J Commun 2014;8:20.
- Choudhury S, et al. Big data, open science and the brain: lessons learned from genomics. Front Hum Neurosci 2014; 8:239
- Crawford K. The hidden biases in big data. Harvard Business Review. Available at: https://hbr.org/2013/04/the-hidden-biases-in-big-data.
- Tene O, et al. Big data for all: privacy and user control in the age of analytics. Nw J Tech Intell Prop 2012; 11:xxvii.
- Patil HK, e al. Big data security and privacy issues in healthcare. Nanthealth: Dallas, US.
-  Rao S, et al. Security solutions for big data analytics in healthcare. Second International Conference on Advances in Computing and Communication Engineering – IEEE, 2015.
- Abouelmehdi K, et al. Big data security and privacy in healthcare: A Review. Procedia Computer Science 2017; 113:73-80.