UK Biobank Data Debate
Full Debate: Read Full DebateLord Tarassenko
Main Page: Lord Tarassenko (Crossbench - Life peer)Department Debates - View all Lord Tarassenko's debates with the Department for Energy Security & Net Zero
(1 day, 12 hours ago)
Lords Chamber
Lord Vallance of Balham (Lab)
I thank the noble Lord for his questions, and I echo his points about the Front-Bench spokesmen on the other side. It is very clear that everyone has the same intent here, which is to try to sort this out, and I welcome those inputs.
I agree that this is cultural as well as technical; those points need to be looked at and will be as part of the review. There is an unwavering commitment to UK Biobank; it is an extraordinarily important resource for the future health of the country and for ensuring that new discoveries are made. We will continue to support UK Biobank.
Lord Tarassenko (CB)
I declare two conflicts of interest. First, I was a participant in UK Biobank, having been recruited in Oxford in 2007. My wife is also a participant and has been far more assiduous—for example, she underwent whole-body imaging about two years ago, which took a whole day. We are both very glad to be participants, but I do not think that the risk of reidentification is that high. It is not zero, but it is a very low probability. That is why I am happy to declare that conflict of interest.
Secondly, I was a UK Biobank principal investigator in a study carried out about 10 years ago, which was to develop AI algorithms for the automated identification of atrial fibrillation and which had about 100,000 participants, who undertook a test on an exercise bike as part of the UK Biobank dataset acquisition. In those days—before 2024, as the Minister said—data would be transferred and held securely on our servers under a material transfer agreement. When the paper was eventually published in 2020, we deleted all data on our servers, as all principal investigators are meant to do. As we have already heard, UK Biobank shifted its policy to a cloud-first model to enhance security, so what happened in the study I was involved in no longer takes place.
My question is about legacy data from before 2024. Does UK Biobank have any estimate, even a semi- accurate estimate, of non-compliant pre-2024 principal investigators? Does the Minister agree with me that UK Biobank should work with data privacy researchers —for example, those from the Oxford Internet Institute, as was mentioned by the noble Lord, Lord Clement-Jones—to be much more proactive at identifying non-compliance, as part of these investigations?
Lord Vallance of Balham (Lab)
Let me thank the noble Lord, Lord Tarassenko, and his wife for participating in UK Biobank, because the whole thing depends on that. He is quite right that, before 2024, the data was downloaded and people did their research on downloaded data. I have had this discussion with the chair of UK Biobank, which is going through a process of recontacting all the institutions—because this is an institutional agreement—to confirm that the data that was downloaded has been deleted. No further access will be granted until that is proven. That process is important, because that residual downloaded data is most vulnerable.
Lord Vallance of Balham (Lab)
UK Biobank, for all the reasons stated, is expensive to run, and it is run with a mix of funding from government, charities and industry, with the major funders being the UK Government and the Wellcome Trust over many years. The principle of it has been to give access to people; therefore, there is not a big cost put on its users. On our approach, we knew that the leak was in China, and we therefore immediately asked the embassy in China to link to the Government there to see if they could help us get these taken off the website. We did not make any conclusion about where they had come from; we just thought that that would probably be the fastest way to get these removed.
Lord Tarassenko (CB)
May I ask the Minister a follow-up question to the one I asked previously? He is absolutely right that UK Biobank should get in touch with the institutions where the principal investigators are based, but a lot of inadvertent leakage, if you will, of the data occurs from the researchers themselves—the principal investigators—who, believe it or not, will put the data on GitHub. They may leave the institution and go and work somewhere else while the data remains on their GitHub. That is why I asked whether the UK Biobank board could be a little more proactive and ask researchers from the Oxford Internet Institute, for example, who are very capable at looking at those types of issues, to look at individual GitHub sites and other sites where the data may still be, even though the institutions which those principal investigators were at would not be aware of it.
Lord Vallance of Balham (Lab)
Yes, we are very aware of the possibility that there are things on GitHub. There has been a GitHub issue related to this, which was identified earlier this year, and that will be part of what UK Biobank looks at. Going forward, that will not be possible because of the inability to download.