UK Biobank Data Debate
Full Debate: Read Full DebateLord Vallance of Balham
Main Page: Lord Vallance of Balham (Labour - Life peer)Department Debates - View all Lord Vallance of Balham's debates with the Department for Energy Security & Net Zero
(1 day, 13 hours ago)
Lords ChamberMy Lords, I thank the Minister for coming forward in relation to this Statement and join in acknowledging unreservedly the profound scientific value of UK Biobank and the extraordinary generosity of the half a million volunteers whose participation has driven life-saving discoveries in heart disease, cancer, dementia, Parkinson’s, and Covid immunity. I emphasise that nothing I say today diminishes that contribution or our commitment to seeing UK Biobank continue to thrive at the heart of the UK’s sovereign health data strategy. But we owe those volunteers honesty, and the honest description of what has happened here, as my honourable friend Victoria Collins said in the Commons last week, is that it was
“a profound betrayal of the people who trusted this institution with some of the most intimate details of their lives”,—[Official Report, Commons, 23/4/26; col. 472.]
including their sleep patterns, mental health, genetic data and medical history.
We welcome the swift removal of the three listings, the co-operation of the Chinese authorities, the self-referral to the ICO, the board-led review, and the development of what UK Biobank describes as the world’s first automated checking system. These are the right steps, but they are steps taken after the fact, and this House is entitled to ask how we arrived here. UK Biobank has apologised for the concern caused—that is not sufficient. We join our Commons Liberal Democrat colleagues in calling for a full and unequivocal apology to participants, not for causing concern but for the breach of trust itself.
We also cannot accept the framing that this was simply a matter of a few bad apples breaking their agreements. The platform allowed data to be downloaded. As the Minister himself confirmed in the Commons,
“this was not … a cyber-attack. This was a legitimate download … by a legitimately accredited organisation”.—[Official Report, Commons, 23/4/26; col. 473.]
That is precisely the problem: contractual promises are not an adequate safeguard for data of this sensitivity. There must be hard, technical barriers, and we are glad that a solution is now being implemented. The question is why it was not in place from the outset.
I have a series of questions for the Minister. First, on the scale of exposure, an associate professor from the Oxford Internet Institute has stated publicly:
“This is the 198th known exposure of UK Biobank data since last summer”,
and that UK Biobank data remains available online for anyone to download today. Will the Minister confirm how many data breaches at or by UK Biobank have been notified to the Government since the original ministerial Statement, and does the Minister have any reason to believe it will not become public that Biobank data has already been used to reidentify specific participants?
Secondly, on leadership and accountability, given the series of decisions, or failures of decision, that have brought us to this point, including the dismissal of earlier warnings, does the Minister have full confidence in the current leadership of UK Biobank? The board-led review is welcome, but its credibility will depend on its independence and transparency.
Thirdly, on reidentification risk, UK Biobank itself acknowledges that it cannot guarantee absolute confidentiality. Modern AI and social media make reidentification far more feasible than was the case when this data was first collected. Crucially, do the Government have contingency plans for large-scale reidentification of Biobank participants, given that, as the Oxford Internet Institute confirms, the data has leaked on nearly 200 occasions, as I mentioned earlier, and remains accessible online?
Fourthly, on the broader lesson for data and AI policy, this incident demonstrates something important: there is no panacea in simply handing patient data to AI systems and trusting that good intentions will follow. So much NHS and Biobank data has already been used in ways that violate the rules under which it was shared. As the Minister in the Commons acknowledged, this was a legitimate download—the rules failed to prevent it. If tearing up data governance rules produced easy wins, we would have seen the evidence by now. Instead, we have received repeated failures, and the Government must reflect on that when designing the new guidance on research data controls that they have promised.
Fifthly and finally, on system-wide lessons, can the Minister confirm that other UKRI and MRC cohort studies will be required to learn from this incident and that their governance will be reviewed? Will the Secretary of State require UK Biobank to publish a full step-by-step plan for reforming its data privacy—not guidance, not reassurances, but binding commitments? The volunteers who built UK Biobank did so in a spirit of trust and public service, and they deserve nothing less than ironclad protections, genuine accountability and the knowledge that their generosity will never again be treated as a governance afterthought.
The Minister of State, Department for Energy and Net Zero and Department for Science, Innovation and Technology (Lord Vallance of Balham) (Lab)
My Lords, I am grateful to the noble Lords, Lord Markham and Lord Clement-Jones, for those responses and questions. The Government agree that this is unacceptable; it is an abuse of UK Biobank’s data and something that we take extremely seriously, and it needs robust, technical solutions.
I start, though, by agreeing with both noble Lords in thanking the UK Biobank participants. The UK Biobank dataset is and remains critical in supporting scientific discoveries. It is quite an extraordinary resource that improves health in many of the ways described, such as predicting dementia and early warning signs for cancer, or finding genetic markers for stroke. This resource is probably the most powerful in the world to do that and none of it would have been possible without the generosity, support and trust of participants. That is why, importantly, this needs a complete and robust response.
I know that UK Biobank has apologised to its participants for what has happened. However, let me put on record the Government’s thanks to all the participants and give my assurance that we will get to the bottom of this and have a robust answer. I extend my thanks to the researchers who are working on these discoveries. We want to make sure that we can get it to be usable again so that this work can continue, but we must protect the data and the participants, as the noble Lord, Lord Clement-Jones, said.
I hope the House will recognise that the Government have acted quickly and seriously. We were made aware of this issue on Monday 20 April and took immediate action. First, within hours of it being raised, we had worked with the embassy in Beijing, the Chinese Government and Alibaba to have the relevant listings removed. They put in place measures to prevent listings being put up again in the same way and to automatically identify and remove relevant adverts. Secondly, we asked UK Biobank to immediately revoke access for the research institutions identified as the source of the information. Thirdly, we asked UK Biobank to stop access to its platform until a solution can be found.
On the point raised by the noble Lord, Lord Markham, that solution has to be a technical one. There are ways to do this, such as secure data platforms that stop people being able to download data. One thing worth reflecting on is that UK Biobank started in 2003. Its data became available in 2012; it was at the forefront of protecting data when it started and had robust mechanisms as to who could access it. What has happened is that, as the dataset has become very large, it has not kept up with the changing requirements for this, which is what need to be put in place now.
The fourth action we took was to ask that participants should be informed immediately.
On what is going to happen and whether the approach from UK Biobank itself is robust, Members of this House will be familiar with the noble Lord, Lord Kakkar. He is the chair of UK Biobank and has assembled a team, including cyber experts, to undertake an urgent, in-depth review of what happened and why. That team will provide its findings to the board, and to us, on or before 10 May.
Further, as has been asked, the Government will issue new guidance on the control of data from research studies. This was in train anyway and will, I hope, be out within the next few weeks. It will apply to all the resources in the UK which are used in this way. Most of them—we think probably all of them, apart from UK Biobank—use a secure data platform, which has the controls.
The UK Biobank resource is important and people volunteered to be part of it because of the benefits it brings for others and for future generations. We need to work with UK Biobank to ensure that researchers with a legitimate need to use the datasets can resume their research, but we must put the participants first and foremost. I agree with the noble Lord, Lord Markham, that this has to be stopped; there has to be a system that can stop this, not just a process.
In the meantime, UK Biobank continues to monitor whether new listings have emerged, because data was downloaded in the past. Up until 2024, it was possible to download data and that was the system used. There was a trust system, backed up by legal contract, but this has, as we know, been shared. New listings will emerge—there have been additional listings posted since the Government were made aware of the issue last week—and we continue to work with the Chinese Government to remove them quickly. While it is now not possible for new downloads of UK Biobank data, there remains a risk that new listings will emerge from data downloads that happened in the past. We will keep the participants and this House updated.
In answer to the question of the noble Lord, Lord Clement-Jones, about the number of breaches, a high number have occurred: most of them are not very significant, but some are significant and all of them are unacceptable. That is what needs to happen. I want to be clear, though, that the need to get these datasets used by researchers around the world pulls in the opposite direction to the need to keep them 100% safe. Therefore, there has to be a system, which is where the secure data environments are so important.
In answer to the questions from both noble Lords about identification, the UK Biobank advises that information such as names, addresses, exact date of birth and NHS number are removed from all the data before it is made available. We do not think any of that was available and we are not aware that any participant has been identified. We also do not believe that there were any purchases of the three listings before we managed to get them taken down. However, we welcome the in-depth board-level review being undertaken, which needs to be comprehensive and cover technical, cultural and process issues. In answer to the noble Lord, Lord Clement-Jones, it is increasingly possible to triangulate in large datasets and get close to identification, and that remains a very real risk.
Turning more broadly to other points, the institutions the data originated from have had their access revoked. While UK Biobank has worked to secure its platforms, all access and downloads have been paused globally. I note, though, that the Chinese Government have been very supportive in getting these listings removed.
The Government are reviewing the way in which we share biodata. A commitment of the biological security strategy 2023, which I think the noble Lord, Lord Markham, referred to, was to reduce the risk of sensitive data being exploited for harmful purposes while maintaining legitimate research collaboration. This will include seeking to harmonise the security policies of the major holders of all genomic data in the UK. We expect to conclude this work over the next few weeks.
The point made about the cyber security and resilience Bill is important, as raised by the noble Lord, Lord Markham. The Bill grants the Secretary of State new powers to issue national security directions to regulated entities or regulators where the compromise or the threat of a compromise to their network and information system poses a national security risk. The use of these powers will always be underpinned by robust intelligence from GCHQ, including, where relevant, information about state actors involved in cyber threats. Minister Narayan explained in the other place that a register of foreign actors is therefore unnecessary in this particular context. We are committed to transparency. The Government are already able to communicate with Parliament and the public about such cyber security risks where it is appropriate to do so.
I end by saying again how important UK Biobank is, how unique it is worldwide in its breadth and depth of coverage, and how appalling it is that this leak occurred. We must make absolutely sure that this risk is eliminated going forward by making sure that a secure data environment is put in place. In 2024, a requirement was made for UK Biobank to put in place an airlock and the requirements that we are now talking about. On 26 January, we asked UK Biobank to put an airlock on the research access platform that it has been using since 2024. Pre-2024, it was all downloads and, post-2024, it is a research access platform, but, unfortunately, that was still downloadable. That is the bit that needs to be stopped now.
My Lords, I declare an interest in the area of cyber security. First, we should congratulate the Government, despite the seriousness of this, on their swift response. Admittedly, it was after the event, but the Government acted on this occasion swiftly and effectively. Secondly, although this was not a cyber attack, changing the behavioural aspects which led to this leak will not be sufficient. As the Minister and various others have said, it will require a range of cultural, behavioural and technical effects to try to minimise the chances of this happening again. In that context, both opposition spokesmen made worthy recommendations and suggestions, which the Minister will no doubt look at.
Finally, given the importance of UK Biobank and the crucial role it plays in scientific research, to the benefit of the health of us all, can the Minister assure us that, although it is an independent charity, this will not undermine the Government’s support for it in the future? It is important to the health of the nation.
Lord Vallance of Balham (Lab)
I thank the noble Lord for his questions, and I echo his points about the Front-Bench spokesmen on the other side. It is very clear that everyone has the same intent here, which is to try to sort this out, and I welcome those inputs.
I agree that this is cultural as well as technical; those points need to be looked at and will be as part of the review. There is an unwavering commitment to UK Biobank; it is an extraordinarily important resource for the future health of the country and for ensuring that new discoveries are made. We will continue to support UK Biobank.
Lord Tarassenko (CB)
I declare two conflicts of interest. First, I was a participant in UK Biobank, having been recruited in Oxford in 2007. My wife is also a participant and has been far more assiduous—for example, she underwent whole-body imaging about two years ago, which took a whole day. We are both very glad to be participants, but I do not think that the risk of reidentification is that high. It is not zero, but it is a very low probability. That is why I am happy to declare that conflict of interest.
Secondly, I was a UK Biobank principal investigator in a study carried out about 10 years ago, which was to develop AI algorithms for the automated identification of atrial fibrillation and which had about 100,000 participants, who undertook a test on an exercise bike as part of the UK Biobank dataset acquisition. In those days—before 2024, as the Minister said—data would be transferred and held securely on our servers under a material transfer agreement. When the paper was eventually published in 2020, we deleted all data on our servers, as all principal investigators are meant to do. As we have already heard, UK Biobank shifted its policy to a cloud-first model to enhance security, so what happened in the study I was involved in no longer takes place.
My question is about legacy data from before 2024. Does UK Biobank have any estimate, even a semi- accurate estimate, of non-compliant pre-2024 principal investigators? Does the Minister agree with me that UK Biobank should work with data privacy researchers —for example, those from the Oxford Internet Institute, as was mentioned by the noble Lord, Lord Clement-Jones—to be much more proactive at identifying non-compliance, as part of these investigations?
Lord Vallance of Balham (Lab)
Let me thank the noble Lord, Lord Tarassenko, and his wife for participating in UK Biobank, because the whole thing depends on that. He is quite right that, before 2024, the data was downloaded and people did their research on downloaded data. I have had this discussion with the chair of UK Biobank, which is going through a process of recontacting all the institutions—because this is an institutional agreement—to confirm that the data that was downloaded has been deleted. No further access will be granted until that is proven. That process is important, because that residual downloaded data is most vulnerable.
My Lords, I am grateful to my noble friend for coming to the House about this very serious incident, and I welcome what the Government are doing to try to prevent it happening again. I echo what has been said about the vital importance of UK Biobank. Nevertheless, I have two quick questions for my noble friend. First, the Statement says that UK Biobank has revoked access to the research institutions that were identified as the source of the leak. Can my noble friend tell the House what those research institutions are? Secondly, in regard to the self-reference by UK Biobank to the Information Commissioner’s Office, what exactly does my noble friend hope will result from that self-reference to ensure that this does not happen again?
Lord Vallance of Balham (Lab)
I thank my noble friend for the question. UK Biobank has identified the sites that were responsible for the leak. There were three institutions: the Second Xiangya Hospital, China-Japan Union Hospital, and Beijing Chaoyang Hospital. Those institutions have been contacted, to be dealt with as discussed. Sorry, I have forgotten my noble friend’s second question.
It was about the Information Commissioner’s Office.
Lord Vallance of Balham (Lab)
UK Biobank referred itself to the ICO, which was the entirely appropriate thing to do. The matter is now in the ICO’s hands.
My Lords, my question to the Minister follows on from the previous question. How will we know that the necessary cultural revolution among stewards of medical data has taken place? What has happened—that UK Biobank allowed the download of the data—is frankly astonishing. I must admit, I have been worried about this sector since I was first involved in the drafting of the GDPR as an MEP nearly 15 years ago. I have also felt that the UK authorities were a little casual in the information given to patients. I do not know if anyone remembers care.data, with the obfuscation about opt-ins and opt-outs and moving the goalposts and so on. The chief executive of UK Biobank made the usual sort of statement:
“We take the protection of participants’ data extremely seriously and do not tolerate any form of data misuse”,
whereas other, more realistic commentators have used adjectives such as “supremely careless”, “irresponsible” and “cavalier”.
How will the Information Commissioner get proactive in this area? It is a betrayal of patients’ trust. Of course we want research. I have been tangentially involved in research for a cure for diabetes, because my late husband was type 1, and he was involved in this area. The fact is that this was re-identifiable data. Some press referred to it as “anonymised”; it was not anonymised. If it was anonymised, it would not be personal data. It is capable of being re-identified if you know some things about someone that are in the database. As Sam Smith of medConfidential said:
“If I knew that someone had a kidney removed on a particular week in June 2021 as Wes Streeting did, I would know everything else about them”.
I do not know whether that is true, but these guys know what they are talking about. As I said, how do we make sure that there really is a cultural revolution? Otherwise, we are not going to have patients’ trust in this essential area.
Lord Vallance of Balham (Lab)
That last point is fundamental: we must have trust. If we do not get trust, we lose the ability to use what is an extraordinary resource, so that has to be part of what is looked at here. It is absolutely part of what the board of UK Biobank is doing in its review, and it needs a very clear look.
When it started, UK Biobank was at the forefront of protection. It had very robust mechanisms to scrutinise researchers and institutions and make sure that this was properly looked after. What has happened is that things have overtaken UK Biobank and it has not kept up, whereas others have put in a secure data platform to try to deal with these issues, so there is a question there. Part of this is because of the sheer size of the database, but it is not excusable, and this needs to be sorted out.
Going back to the point made by the noble Lord, Lord Tarassenko, it is theoretically possible to re-identify people. It is not at all easy, and it is a low probability, but it is not zero probability. Therefore, I agree with the point that this is a real wake-up call for researchers. We need to make sure that we build the right trust in. We are putting together the Health Data Research Service, with this at the very heart of what it is going to do to make sure that there is trusted access to this type of data.
My Lords, I declare an interest as a participant in UK Biobank. It is not the first time I have said that: when I was Secretary of State, I said it in the context of encouraging people to support UK Biobank. Does the Minister agree that, notwithstanding this lamentable abuse of the data, those of us who are participants see such value emerging from UK Biobank that we think we should happily continue to volunteer our data and our services whenever we are asked? However, it surprises me, as somebody who has received emails from UK Biobank, that since this was in the press I have received nothing, although it says that it is going to contact all participants and the Statement says so. I am surprised that eight days have gone by and nothing has emerged.
The website of UK Biobank says—the noble Lord, Lord Clement-Jones, referred to it—that it intends to have what it describes as an automated data-checking scheme in place by the end of the year. Can the Minister kindly tell us a bit more? What would that add and why is it important? It seems to me that what we are looking for in this age of AI systems is something which not only prevents unauthorised access but is capable of identifying every subsequent use of that data wherever that data may have been provided under the licence.
Lord Vallance of Balham (Lab)
Again, let me thank another participant at UK Biobank. One of the features I have found whenever I have met UK Biobank participants or visitors is how incredibly altruistic everybody is: they want to do it for the common good. That is a very common theme, and I am sure that that is going to be the response now. As for contact, we asked UK Biobank to contact all participants immediately. I understand that it does not have an email address for about half of the participants, so it has written, and I believe it has sent emails to all those it has email addresses for. As to what happens next, I agree that technological changes are so fast that this has to be something that keeps up with that. The first step, I think, is to put in one or two very clear airlocks, before you get to the data, that stop you being able to export the data. That is the immediate concern. Then there are ways in which it is possible to see where data has gone, and these things will be looked at as part of the review that is going on.
My Lords, I am extremely interested in what the Minister said about leaks and UK Biobank. With his undoubted knowledge of these things, can he go back to whether there is any possibility that Covid came from activity in China, as has been suggested by an author who is here? Secondly, does he think that anybody who was responsible for getting us to close down in that disastrous lockdown should be held responsible in any way?
Lord Vallance of Balham (Lab)
This is somewhat off topic. In terms of China, I think it is very clear that there are three possibilities for where Covid came from. One is that it was a natural infection that spilled over from bats, with billions of chances for that to happen. The second is that an infection was taken into a lab and there was a lab leak at some point. The third is that it was designed in some way. I think that the last of those is very unlikely indeed, and that is what most people think. We cannot really distinguish between the first two by any other way than biosecurity services.
My Lords, I thank the Minister for his reply on the Statement, and I commend the Government for taking immediate action when this data breach was known. Last Tuesday, the Science and Technology Committee took evidence from the chief executive of UK Biobank, and some of the issues brought out today about data security were brought out then. Two days later, we got the news about the Chinese data breach. The lessons here are about public confidence in other research. The Minister referred to the Health Data Research Service that will soon be established and the Genomics England data. Can he assure us that the public can have confidence that those two organisations sharing their data will be secure and that the Government will issue new guidance to every organisation that will use health data for research purposes?
Lord Vallance of Balham (Lab)
I can absolutely assure the noble Lord that the Government will issue guidance. That guidance was in development anyway, and I expect it to come out within the next few weeks. I can also assure him that other platforms use secure data platforms where the downloading of data is not possible. There was a rather unusual situation with UK Biobank where the data was downloadable, which is not true for many others. We absolutely need to use this to build confidence that these data are properly looked after and used for the purposes for which they were given.
My Lords, I will ask the Minister a couple of questions. This breach has had a silver lining: to remind us that the UK Biobank is a remarkable project and an act of British soft power—and, indeed, altruism—which has been used by 22,000 researchers in 60 countries and produced 18,000 research papers. It really is remarkable; when we beat ourselves up in this country, it is useful to be reminded about the remarkable things we are capable of doing. I described it as an act of altruism, but what is the cost of remedying, as it were, the procedures that led to the breach, and is the cost of the UK Biobank shared by the institutions that use this remarkable resource all around the world, or does it fall entirely on the British taxpayer? Can the Minister also comment on the role of the Chinese Government? It seems to me that we were quick to reach a conclusion when this story initially broke that, somehow, they were involved, but, as is mentioned in the Statement, this was simply a theft that took place in China and did not in any way involve bad faith by the Chinese Government.
Lord Vallance of Balham (Lab)
UK Biobank, for all the reasons stated, is expensive to run, and it is run with a mix of funding from government, charities and industry, with the major funders being the UK Government and the Wellcome Trust over many years. The principle of it has been to give access to people; therefore, there is not a big cost put on its users. On our approach, we knew that the leak was in China, and we therefore immediately asked the embassy in China to link to the Government there to see if they could help us get these taken off the website. We did not make any conclusion about where they had come from; we just thought that that would probably be the fastest way to get these removed.
Lord Tarassenko (CB)
May I ask the Minister a follow-up question to the one I asked previously? He is absolutely right that UK Biobank should get in touch with the institutions where the principal investigators are based, but a lot of inadvertent leakage, if you will, of the data occurs from the researchers themselves—the principal investigators—who, believe it or not, will put the data on GitHub. They may leave the institution and go and work somewhere else while the data remains on their GitHub. That is why I asked whether the UK Biobank board could be a little more proactive and ask researchers from the Oxford Internet Institute, for example, who are very capable at looking at those types of issues, to look at individual GitHub sites and other sites where the data may still be, even though the institutions which those principal investigators were at would not be aware of it.
Lord Vallance of Balham (Lab)
Yes, we are very aware of the possibility that there are things on GitHub. There has been a GitHub issue related to this, which was identified earlier this year, and that will be part of what UK Biobank looks at. Going forward, that will not be possible because of the inability to download.
My Lords, we are due to consider the Pension Schemes Bill, due from the other place. We will adjourn during pleasure until a point shown on the annunciator.