Personal genetic data processing is routinely believed to be subjected to Data Protection Regulations and in particular to the EU General Data Protection Regulation. While this is – in general – true, it is important to know exactly when and until where those regulation can affect the genetic research and – therefore – the possibility to find a cure for genetic diseases. Clearly, an actual life-or-death problem.

Introduction

In what would actually fits Cicero’s famous summus jus, summa injuria, GDPR and privacy concerns might be interpreted in a way to actually hampers scientific research while not providing any benefit whatsoever to the data subject – a clear example of the Stupidity Golden Law. Therefore it is – verbatim – vital that those provisions be enforced by understanding that Data Protection is neither the only nor the most important right to be protected and that there are superseding rights – such as saving human life and avoid the pain of a human being – that can’t be compromised by the GDPR or whatever data-protection related legislation.

Actually, by reading the GDPR provisions, these statements sound obvious: Whereas 2 of the GDPR states, indeed that

This Regulation is intended to contribute to the accomplishment of an area of freedom, security and justice and of an economic union, to economic and social progress, to the strengthening and the convergence of the economies within the internal market, and to the well-being of natural persons (emphasis added)

But the reality of the “daily operations” shows that the lack of knowledge about how scientific research is carried out, Data Protection Authority fines’ threat and unscrupulous advices coming from under-prepared, self-appointed “GDPR experts” create unnecessary burden to the genetic research, this latter being a notion often used – even at Institutional level – without a clear understanding of its meaning.

“Genetic research”, indeed, is an umbrella definition covering different activities, from genetic sequences biobanks management to bioengineering, and a lot of methods hardly fitting into a “one-catch-all” description. This is why, when talking about GDPR and genetic research, the very first thing to do is to look at what the researchers want and by what means they want to achieve the result. Only by answering these two fundamental question it will be possible to assess the legal constraints (if any) to the project.

Facts

An example comes from a research paper recently published by Nature, Correction of a pathogenic gene mutation in human embryos, exploring the hypothesis that some genetic mutations might be corrected in human gametes or early embryos by way of the CRISPRS-Cas9 technique.

Actually, this paper has nothing to do with the GDPR since its authors comes from South Corea, China and USA, but exactly for this reason it is interesting to look at it from an EU viewpoint.

Let’s start with the object of the research, the MYBPC3 gene mutation related to Hypertrophic Cardiomyopathy or HCM, a heart-related genetic disease, investigated because of its high frequency in human population:

MYBPC3 mutation is found at frequencies ranging from 2% to 8%3 in major Indian populations … HCM … has an estimated prevalence of 1:500 in adults and manifests clinically with heart failure.

We understand from this “opening statement” that this research involves the processing of ethnicities data AND/OR patient-related data.

A further reading, reveals that an area of interest is preventing second-generation transmission. This might imply the processing of patient’s ancestors, relatives and descendant (while probably limited to the yes/no information on having developed the disease):

One approach for preventing second-generation transmission is preimplantation genetic diagnosis (PGD) followed by selection of non-mutant embryos for transfer in the context of an in vitro fertilization (IVF) cycle. When only one parent carries a heterozygous mutation, 50% of the embryos should be mutation-free and available for transfer, while the remaining carrier embryos are discarded. Gene correction would rescue mutant embryos, increase the number of embryos available for transfer and ultimately improve pregnancy rates.

At least the genetic and personal data of an identified individual have been processed by the researchers:

An adult male patient with well-documented familial HCM caused by a heterozygous dominant 4-bp GAGT deletion (g.9836_9839 del., NC_000011.10) in exon 16 of MYBPC3, currently managed with an implantable cardioverter defibrillator and antiarrhythmic medications, agreed to donate skin, blood and semen samples.

at which have been added the genetic information of a set of 19 embryos used as control group, revealing that:

Sequencing of 83 individual blastomeres collected from 19 control embryos revealed that 9 (47.4%) were homozygous wild type (MYBPC3WT/WT) and 10 (52.6%) were heterozygous, carrying the wild-type maternal and mutant paternal alleles (MYBPC3WT/∆GAGT) … This distribution was expected and confirms that the heterozygous patient sperm sample contained equal numbers of wild-type and mutant spermatozoa with similar motility and fertilization efficiency.

The personal data of the donors are known and used without further anonymization, as it can be deduced by the informed consent asked the donors:

Informed consent

The robust regulatory framework set forth by OHSU clearly specified that informed consent could be obtained only if prospective donors were made aware of the sensitive nature of the study. The consent form clearly presented the scientific rationale for the study; stating (in both the Clinical Research Consent Summary and the Purpose section of the consent form) that gene editing tools will be used on eggs, sperm, and/or embryos to evaluate the safety and efficacy of gene correction for heritable diseases. Additionally, consent form language clearly stated that genetic testing would be conducted in addition to creation of preimplantation embryos and embryonic stem cell lines for in vitro analyses and stored for future use. The incidental discovery of genetic information that might be important to the donors’ healthcare is a possible outcome when engaging in this type of research. Informed consent documents provided the donor with the option to receive this information or not (emphasis added). Written informed consent was obtained before all study-related procedures on current, IRB-approved, study-specific consent forms.

This quote shows ? that the researcher are able to backtrace the donors’ identity in every phase of their activity, as is made clear from the possibility, given the donor, to be informed of important health-related issues that might come as a research’s “collateral effect”.

The patient recruiting part of the research shows other area of genetic/health-related personal data:

Study participants
Healthy gamete donors were recruited locally, via print and web-based advertising. Homozygous and heterozygous adult patients with known heritable MYBPC3 mutations were sought; however, only three adult heterozygous patients were identified by OHSU Knight Cardiovascular Institute physicians and referred to the research team (emphasis added), one of whom agreed to participate in the study.
…
Controlled ovarian stimulation
Research oocyte donors were evaluated before study inclusion as previously reported.

As always in this kind of researches, the involved data subjects belongs to two categories: the healthy (acting as a control group) and the disease-affected and are selected through a pre-screening that involves sensitive data processing but the paper doesn’t account for the personal data processing cares adopted in this stage of the research.

To summarize, this is the personal information gathering process that can be deduced from the paper:

a group of scientists belonging to different institutions located in different part of the world define the goal of the research and its object(s) such as biosamples, cellular lines etc.,
the research group runs a direct search for healthy people, and asks an hospital to select patients potentially interested in the research,
both the patients and the healthy people data are processed throughout the whole research so that each individual’s identity was known to the researchers,
while an informed consent has been requested to the people who agreed to participate into the study, there are no information about how the pre-screening phase has been carried out by the hospital that selected the prospect patients,
a specific and identified individual has been the target of the researcher attention,
the researchers know that the patient’s ancestors and descendant might be ill as well, but the paper carries no information about the scientists to know patient relatives’ identity,
part of the research is based on statistical information involving ethnic origins.

The EU-centric GDPR perspective

From an EU-centric information protection perspective this process shows some grey areas in the personal information processing, mainly in the way to provide information and gather consent from the involved people.

An ideal model to handle genetic/health-related personal information according the EU legislation should assess, as a primary step, which of the collected data fall within the GDPR Personal Information definition (information that identify a natural person or make her identifiable).

Simple things first: the ethnicities of the disease-affected people are not subjected to the GDPR since these are just statistical information, while selected (and rejected) patients, their relatives and healthy people clearly fall in the data-subject’s category.

Now the difficult one: the information exctracted by DNA parts that are “cut” by the CRISPR-Cas9 method might not easily be included in the “personal information” legal definition.

There is a point, indeed, where information related to an individual lose its “quality” and become “neutral”, therefore not being regulated by the GDPR anymore: no normal person would argue that knowing the chemical elements we are made of is a personal information processing.

Coming to a less extreme example, the portion of DNA to be edited to remove a defective part ? clearly belongs to a specific human being, but the information it provides lose its “identifying power”, exactly because the results of the editing process can be extended to whoever shares the same pathological condition and are no more unequivocally connected to a single, specific person. In other words, this is a variation of the classical Sorites Paradox that, in its legal dress, becomes: is it possible that at a certain point, after tearing in pieces an information related to an individual, these information while still connected to him, lose its quality of “personal information”?

The second thing to assess is which of the collected data fall within the GDPR scope (protection of personal information to be processed by way of a filing system) and now things become tricky.

As far as it can be said by reading the paper, while the control group information have been somehow organized in a structure and then surely should be processed according to the GDPR, at least in relationship to the accidental discovery of health issues, the single, research-participant patient data are processed individually. This means that while being “personal information”, they might not be subjected to the GDPR because of the lack of a filing system as a part of the processing.

Third, the purpose of the processing must be defined. As it is clearly stated by the scientists, the goal of the research is to look for a way to use CRISPR-Cas9 to “delete” a specific mutation. This means that the processing of genetic information of the involved people is a tool to an end, and the end in itself. In other words, the researchers’ expected outcome is an answer to a general question and not a result tailored on the patient’s needs/conditions.

As far as this distinction is thin, there is a difference between the processing of personal information as, no pun intended, purpose of the processing, and the processing of personal information as a way to manipulate chemical, person-unrelated portions of a genetic compound.

Conclusions

Scientific (and, in particular) genetic researches are backed by a more-than-legitimate interest, and the way they work makes very hard to collect the consent to personal information processing for every collateral path of investigation that might stem from the main research tree.

A balanced approach between the right of the data subject and the “greater good” pursued by the scientist should be the solution of choice, that should take into account, as the GDPR itself say, the risk for the dignity and other fundamental rights and the need of letting Science runs as fast as it can.

So, to come to practical conclusions, the paper commented in this article shows that there are several kind of personal information involved, and several different processing purposes with different level of risk.

Statistical, ethnicity-related information, are just out of the GDPR’s reach.

While not at a high risk, patients clinical information during the selection phase, selected patients personal information and control group information, as soon as it are processed by way of a filing system should be handled under the control of the hospital’s ethical committee. This means informing both the patients themselves and the people belonging to the control group and gathering a wide consent, as stated by GDPR’s Whereas 33

It is often not possible to fully identify the purpose of personal data processing for scientific research purposes at the time of data collection. Therefore, data subjects should be allowed to give their consent to certain areas of scientific research when in keeping with recognized ethical standards for scientific research. Data subjects should have the opportunity to give their consent only to certain areas of research or parts of research projects to the extent allowed by the intended purpose.¹

Of course, if the process is fully anonymized, for instance by having the hospital keeping the patient/volunteer identity separate from the rest of the medical record, and having a third party to collect and organize the information, the researchers wouldn’t be subjected to the GDPR because there is no way they could backtrace participants’ personal identities.

Should this anonymization process become a standard, this would represent an acceptable balancement between data subjects rights and Science’s needs.

The part that is still to be discussed, is the one related to the actual DNA manipulation. While the DNA clearly belongs to an identified subject, the aim of the data processing is related to the chemical compound’s behaviour and not to a natural person as such.

Again, under a fair balancement between individual rights and public needs, there is room to conclude that in this latter case the GDPR shouldn’t be enforced.

Nota Bene: while this Whereas actually give the possibility to broaden the consent up to researches not yet carried on, it is important to understand that the area of researches must be clearly indicated. On one hand this sounds reasonable, on the other hand the need to clearly specify the area of research might create future hurdles when, by chance, specific personal information would become useful in a field different than the original one. The cost of coming back to the patient to extend the consent could be a blocking factor in term of costs and administrative burdens. ↩