Over the past decade, the importance of personal health data has increased hugely in medical research. It is estimated that the global healthcare Big Data market will grow from $11.5 billion in 2016 to a value of $70 billion in 2025, with an increase of 568%.
Healthcare systems are increasingly challenged by growing and ageing populations living with more chronic diseases. More effective and smarter medicine is needed to deliver better patient care through telemedicine and predictive medicine using algorithms and artificial intelligence. However, these new technologies need to process a large amount of personal health data.
Technological innovations in the healthcare industry open up a number of important ethical questions, including how to store and protect health personal data.
Personal data concerning health includes all data on a person’s health status that reveals information relating to that person’s past, current or future physical or mental health status. As they come within a person’s most intimate sphere, unauthorised disclosure may lead to various forms of discrimination and violation of fundamental rights. For this reason, Regulation 679/2016 (GDPR) establishes that certain types of data fall under a special category of personal data, including health data, and they require additional protection as they can go to the very private sphere of a human being. Furthermore, the Member States can maintain or introduce further conditions, including limitations for the health data processing.
In any case, the best way to protect personal health data is to use encryption as well as anonymisation and pseudonymisation techniques.
How to protect Personal Health Data?
Anonymisation and pseudonymisation seem synonyms, but there is a subtle distinction between these two terms.
Pseudonymisation is the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information (provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person).
Instead, through the anonymisation process, the personal data are irreversibly altered so that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party.
Year after year, cybercriminals become more skilled; to defeat them, new pseudonymisation techniques are put in place. The most common are the following:
- Directory replacement: means the modification of data concerning the registered, while there is still a link between the values. For example, it is possible to use a number to identify an individual and store information that directly identifies an individual, such as a personal identification number, separately. In this way, it is possible to pseudonymise personal health data. To obtain an anonymisation process, the separate sensitive information that directly identifies the registered should be deleted.
- Scrambling: in straightforward terms, scrambling is a pseudonymisation technique consisting of mixing letters. Many examples of scrambling techniques are encryption and hashing.
- Masking: means that some of the information is hidden using random characters or other data.
The pseudonymisation technique is typically intended to contrast the efforts of an adversary to perform a re-identification attack. The controllers and the processors of personal data have to consider whether the adversaries can be internal or external to the organisation.
An insider is an adversary with specific knowledge, capabilities, or permissions. In the context of pseudonymisation, this implies the adversary is able to obtain information on the pseudonymisation secret and/or other relevant significant information. Differently, an external adversary does not directly access the pseudonymisation secret or additional relevant information but he/she aims to increase his or her own information on the pseudonymised dataset (e.g. by learning the identity behind a given pseudonym and obtaining further information on that identity from the additional data found in the dataset for the given pseudonym). That implies that the pseudonymisation process has to consider all the desired pseudonymisation targets for the specific case (by whom the identifies need to be hidden? which is the expected utility for the particular case?) as well as the easiest implementation.
A risk-based approach following the privacy by design and by default principle (article 25 GDPR) can aid in choosing the proper pseudonymisation technique to mitigate the relevant privacy threats properly.
Privacy by Design means embedding data privacy features and data privacy-enhancing technologies such as the most appropriate pseudonymisation technique directly at an early stage of the collection time. This approach helps to ensure better and more cost-effective protection for individual data privacy. Privacy by Default means that the user service settings must be automatically data protection-friendly. Only data necessary for each specific purpose of the processing should be gathered.
Personal health data are crucial for the future and present of medicine. They help to adopt well-informed decisions, implement artificial intelligence and reduce the costs for the hospitals and other medical institutions. However, since health data reveal the most intimate information of people, an unauthorised disclosure may lead to various forms of discrimination and violation of fundamental rights.
For these reasons, it is essential to create awareness among researchers about the critical issues involved in processing health data as well as the managing of data breach events. A risk-based approach can help mitigate risks to citizens’ rights and freedoms.