Data (Use and Access) Bill [HL]

Lord Tarassenko Excerpts
Lord Winston Portrait Lord Winston (Lab)
- View Speech - Hansard - - - Excerpts

My Lords, I sat through the Committee stage and did not speak, because I was so clear that the amendment tabled by the noble Viscount, Lord Colville, was really inappropriate. Indeed, it was the speech by my noble friend Lord Vallance that made me feel that speaking was unnecessary. I regret that, because I should have pointed out something very important.

First, to come back to what the noble Baroness has just said, the definitions in the Bill are flawed to start with. Devising new applications of available knowledge is not the work of scientists; it is essentially the work of technologists—and there is a big difference between technology and science. Technology has all sorts of downsides that we do not expect. Science is simply knowledge and, as such, it does not have an ethical dimension. That has been stated by many distinguished philosophers in the past.

I suggest to noble Lords that every invention and innovation that we make has a downside that we do not expect. I would challenge the noble Viscount, Lord Colville. I have sat on a scientific committee with him and we have enjoyed each other’s company, but I would argue that there is not a single case where he could argue that there has not been an unexpected downside—as there is with every scientific invention. That, essentially, is why this is a flawed problem.

There are three problems facing humanity and the world at the moment. One, clearly, is nuclear war; one, clearly, is climate change; and perhaps the most important one is the risk of antibiotic resistance. It turns out that all these risks were completely unseen when those technologies were started. Marie Curie, working on pitchblende and radium, had no idea that the X-rays she was producing would be dangerous to her and would, unfortunately, result in her death as a result of cancer. She certainly did not recognise that there might be such a thing as nuclear explosions.

When it comes to, for example, climate change, it is obvious that we would not want to avoid having had the Industrial Revolution and all the things we got from it, just as we would not want to avoid having X-rays. But we must understand that the control has to come later.

Lastly, the most important thing for me is the possibility that infection is one of the greatest problems that humanity faces on earth at the moment. The invention of antibiotics has actually made that worse, because we now have a rising number of infections that are due to resistance. Therefore, I beg the noble Viscount not to press his amendment. I believe that it is absolutely well-meaning, and I understand what he is trying to say, but I hope that the Opposition, when they listen to this, will consider at least abstaining from voting on it, because the risk is that it might bring the House of Lords into disrepute as it stands.

Lord Tarassenko Portrait Lord Tarassenko (CB)
- View Speech - Hansard - -

My Lords, I am authorised to speak on Motion 43A, as someone with regular day-to-day experience of scientific research. Since I started my PhD in 1981, I have had the privilege of spending more than half my working life doing scientific research in the UK—the last 20 years working with very sensitive patient data. Most of that research has been carried out in an academic setting, but some of it has been in collaboration with medtech, AI and pharmaceutical companies.

This research has required me to become familiar with many three-letter and four-letter acronyms. Noble Lords will know about DBS, but they might not know about RSO, TRO, HRA, LREC, MREC, CAG, and IRAS, to name just a few. I have spent hundreds of hours working with clinical colleagues to fill in integrated research application system—IRAS—forms. IRAS is used to apply for Health Research Authority—HRA—approval for research projects involving the NHS, social care or the criminal justice system. I have appeared before not only medical research ethics committees, or MRECs, which test whether a research protocol is scientifically valid and ethical, but local research ethics committees, or LRECs, which consider the suitability of individual researchers and local issues.

I was involved in a research project which reused data acquired from patients on a Covid isolation ward during the first two waves of the pandemic. That research project sought to understand how nurses interpreted continuous data from the clinical-grade wearables we used to monitor these high-risk patients during Covid. It took our research team more than 18 months to obtain the relevant permissions to reuse the data for our proposed analysis. Our application was reviewed by the Confidentiality Advisory Group—CAG—which provides independent expert advice on the use of confidential patient information without consent for research and non-research purposes. CAG already considers whether accessing the confidential data is justified by the public interest. Its advice is then used by the HRA and the Secretary of State for Health and Social Care to decide whether to grant access to the confidential data.

The existing provisions in this country to allow access to data for research purposes are stringent, and it is entirely right that they should be. The UK is respected the world over for the checks and balances of its research governance. The relevant safeguards already exist in the current legislation. Adding a further public interest test will only increase the amount of bureaucracy that will inevitably be introduced by the research services offices, or RSOs, and the translational research offices, or TROs, of our universities, which are very good at doing this.

The extra burden will fall on the researchers themselves, and some researchers may decide to concentrate their available time and energy elsewhere. This amendment, I am afraid, will have the unintended consequence of having a negative impact on research in this country, so I cannot support it.

Lord Clement-Jones Portrait Lord Clement-Jones (LD)
- View Speech - Hansard - - - Excerpts

My Lords, an onlooker might be forgiven for not perceiving a common theme in this group of amendments, but I thank the Minister for his introduction and the noble Viscounts for introducing their amendments so clearly.

I acknowledge that Motion 32A and Amendments 32B and 32C and Motion 52A and Amendments 52B and 52C from the noble Viscount, Lord Camrose, are considerably less prescriptive than the Spencer amendment in the House of Commons to introduce new Clause 21, which seemed to require public authorities to comb through every record to rectify data, went significantly further than the findings of the Supreme Court judgment, and potentially failed to account for the privacy afforded to GRC holders under the Gender Recognition Act. However, the Liberal Democrats will abstain from votes on the noble Viscount’s amendments for several key reasons.

Our primary reason is the need to allow time for the EHRC’s guidance to be finalised. I thought the Minister made his case there. The EHRC is currently updating its code of practice, as we have heard, to reflect the implications of the Supreme Court judgment on the meaning of sex in the Equality Act, with the aim of providing it to the Government by the end of June. This guidance, as I understand it, is intended specifically to support service providers, public bodies and others in understanding their duties under the Equality Act and putting them into practice in the light of the judgment. The EHRC is undertaking a public consultation to understand how the practical implications can best be reflected. These amendments, in our view, are an attempt to jump the gun on, second-guess or at the least pre-empt the EHRC’s code of practice.

On these Benches, we believe that any necessary changes or clarifications regarding data standards should be informed by the official guidance and implemented consistently in a coherent and workable manner. We should allow time for the EHRC’s guidance to be finalised, ensuring that any necessary changes or clarifications regarding data standards are informed by its advice and implemented consistently across public authorities in a coherent and workable manner. We have concerns about workability and clarity. Although the amendments proposed by the noble Viscount, Lord Camrose, are less prescriptive than previous similar proposals in the Commons tabled by Dr Spencer, we have concerns about their practical implementation. Questions arise about how public authorities would reliably ascertain biological sex if someone has a gender recognition certificate and has updated their birth certificate. I have long supported same-sex wards in the NHS, but I do not believe that these amendments are helpful in pursuing clarity following the Supreme Court judgment. We heard what the Minister had to say about passports.

I welcome the clarity provided by the Supreme Court judgment, but there are clearly implications, both practical and legal, to be worked out, such as those mentioned by the noble Viscount, Lord Hailsham. I thought he put his finger on many of those issues. I trust that the EHRC will deliver the right result. I agree that data needs to be accurate, and I welcome the Sullivan report, as did my noble friend. In summary, we will be abstaining. We believe that the EHRC process needs to conclude and provide comprehensive guidance, while also reflecting concerns about the workability and appropriateness of specific legislative interventions on data standards at this time.

I move on to Amendment 43B, tabled by the noble Viscount, Lord Colville. This amendment may not reinstate the precise wording

“conducted in the public interest”

that we previously inserted in this House, but it would introduce safeguards that seek to address the same fundamental concerns articulated during our debate on Report. It does two important things.

First, it provides a definition of “scientific research”, clarifying it as

“creative and systematic work undertaken in order to increase the stock of knowledge”.

This directly addresses the concerns raised on Report that the line between product development and scientific research is often blurred, with developers sometimes positing efforts to increase model capabilities or study risks as scientific research. Having a clear definition helps to distinguish genuine research from purely commercial activity cloaked as such.

Secondly, and critically, Amendment 43B would require:

“To meet the reasonableness test”


already present in the Bill,

“the activity being described as scientific research must be conducted according to appropriate ethical, legal and professional frameworks, obligations and standards”.

This requirement seeks to embed within the reasonableness test the principles that underpinned our arguments for the public interest requirement on Report and is the same as the amendment put forward by the chair of the Science, Innovation and Technology Select Committee, Chi Onwurah MP, which ties the definition to the definition in the OECD’s Frascati Manual: Guidelines for Collecting and Reporting Data on Research and Experimental Development:

“creative and systematic work undertaken in order to increase the stock of knowledge—including knowledge of humankind, culture and society—and to devise new applications of available knowledge”.

The Frascati framework is used worldwide by Governments, universities and research institutions to report R&D statistics, inform science policy and underpin R&D tax credit regimes, and it serves as a common language and reference point for international comparisons and policy decisions related to scientific research and innovation. These frameworks, obligations and standards are important because they serve the very purposes we previously identified for the public interest test: ensuring societal benefit, building public trust, preventing misuse for commercial ends, addressing harmful applications, and alignment with standards.

Amendment 43B in the name of the noble Viscount, Lord Colville, is a thoughtful and necessary counter-proposal. It is Parliament’s opportunity to insist that the principles of public benefit, trust and responsible conduct, rooted in established frameworks, must remain central to the definition of scientific research that benefits from data re-use exceptions.

I heard what the noble Lord, Lord Winston, had to say in his very powerful speech, but I cannot see how the amendment from the noble Viscount, Lord Colville, cuts across all the things that he wants to see in the outcomes of research.

--- Later in debate ---
Lord Tarassenko Portrait Lord Tarassenko (CB)
- View Speech - Hansard - -

My Lords, I rise to speak as the founder of two AI spin-outs, and I draw the House’s attention to my registered interests as the founder-director of Oxehealth, a University of Oxford spin-out that uses AI for healthcare applications. I am also the author of three copyrighted books.

Since these amendments were last debated in the House of Lords, there has been a lot of high-profile comment but very few attempts, if any, to bring AI developers and creators together in the same room. During the same period, however, more businesses from the creative industries and the publishing sector have agreed content-licensing deals. That is because access to curated, high-quality content to fine-tune large language models—the step after pre-training which provides high-accuracy responses—is increasingly being monetised.

Even the Guardian Media Group, a strong supporter of the creative industries, announced in February a strategic partnership with Open AI to ensure compensation for the use of its high-quality journalism. This shows that it is possible, without any change in the law, for the creative industries and the big tech companies to come to licensing agreements.

The main technological development since our last debate has been the demonstration that training LLMs no longer requires the massive computer facilities and huge data centres of the big tech companies in the US. Since the beginning of the year, the Chinese company DeepSeek has released open-source LLMs hundreds of times smaller than hyperscale models such as GPT-4, Gemini or Claude Sonnet. These models, typically with, say, 10 billion weights, have been developed through the process of distillation, and they achieve almost the same level of performance as the hyperscale models with 1 trillion weights.

Why is that important? It means that users of LLMs no longer have to send queries to those hyperscale models which are then processed by OpenAI, Google or Anthropic using their huge compute facilities with thousands of GPUs in their data centres. Instead, any AI developer can now train and run distilled versions of those LLMs locally on their laptops.

DeepSeek was the first AI company to show how powerful the process of distillation is in the context of LLMs. Other big tech companies are now jumping on the bandwagon. In early March, Google released a brand-new LLM called Gemma 3, a lightweight, state-of-the-art open-source model that can be run anywhere from smartphones to laptops, and has the ability to handle text, images, and short videos.

These open-source distilled LLMs are now being used by thousands of AI developers, in the UK and elsewhere, who are training and fine-tuning them using content, some of which may be copyrighted, publicly available on the web. Training an LLM on a laptop using data from the open web will become as commonplace as searching the web. This is already happening both within computer science departments in UK universities and in the rich ecosystem of AI start-ups and university spin-outs in the UK.

A survey of 500 developers and investors in the UK AI ecosystem, carried out by JL Partners last month, had 94% of them reporting that their work relied on AI models built using publicly available data from the web, and 66% reported that if the data laws in the UK were more restrictive than elsewhere, projects would move to other countries. We need to consider the impact on the UK’s AI industry of these transparency provisions, and of the requirement to provide copyright owners with information regarding the text and data used in the pre-training, training and fine-tuning of general-purpose AI.

The use of content from behind paywalls or from pirated databases such as Books3 or LibGen, which is known to have been done by Meta to train its LLM, is clearly illegal. However, for data publicly available on the open web, I would like to do a simple thought experiment to show that the transparency requirements in Motion 49A are at present unworkable. In the UK, unlike in the US, there is no copyright database. Usually, the copyright rests with the author of the work, but there are exceptions, such as when a work is created by an employee in the course of their job, and copyright may also be assigned or transferred to a third party. If we assume, generously, that it might take just one second, on average, to ascertain the copyright status of an article, book, image, or audio or video recording, on the web, it would require 31 years and eight months to check the copyright status of the 1 billion data points in a typical LLM training set—never mind thinking about setting up licensing deals with the millions of rights holders. For the distilled models that are now, as I explained, being trained or fine-tuned by UK developers, which are 100 times smaller, the copyright status check would still require one-third of a year—still an entirely unworkable proposition.

Data (Use and Access) Bill [HL]

Lord Tarassenko Excerpts
Lord Tarassenko Portrait Lord Tarassenko (CB)
- Hansard - -

My Lords, I will speak briefly in support of this amendment. Anyone who has written computer code, and I plead guilty, knows that large software systems are never bug-free. These bugs can arise because of software design errors, human errors in coding or unexpected software interactions for some input data. Every computer scientist or software engineer will readily acknowledge that computer systems have a latent propensity to function incorrectly.

As the noble Baroness, Lady Kidron, has already said, we all regularly experience the phenomenon of bug fixing when we download updates to software products in everyday use—for example, Office 365. These updates include not only new features but patches to fix bugs which have become apparent only in the current version of the software. The legal presumption of the proper functioning of “mechanical instruments” that courts in England and Wales have been applying to computers since 1999 has been shown by the Post Office Horizon IT inquiry to be deeply flawed. The more complex the program, the more likely the occurrences of incorrect functioning, even with modular design. The program at the heart of Fujitsu’s Horizon IT system had tens of millions of lines of code.

The unwillingness of the courts to accept that the Horizon IT system developed for the Post Office was unreliable and lacking in robustness—until the key judgment, which has already been mentioned, by Mr Justice Fraser in 2019—is one of the main reasons why more than 900 sub-postmasters were wrongly prosecuted. The error logs of any computer system make it possible to identify unexpected states in the computer software and hence erroneous system behaviour. Error logs for the Horizon IT system were disclosed only in response to a direction from the court in early 2019. At that point, the records from Fujitsu’s browser-based incident management system revealed 218,000 different error records for the Horizon system.

For 18 years prior to 2019, the Post Office did not disclose any error log data, documents which are routinely maintained and kept for any computer system of any size and complexity. Existing disclosure arrangements in legal proceedings do not work effectively for computer software, and this amendment concerning the electronic evidence produced by or derived from a computer system seeks to address this issue. The Post Office Horizon IT inquiry finished hearing evidence yesterday, having catalogued a human tragedy of unparalleled scale, one of the most widespread miscarriages of justice in the UK. Whether it is by means of this amendment or otherwise, wrongful prosecutions on the basis that computers always operate properly cannot continue any longer.

Earl of Erroll Portrait The Earl of Erroll (CB)
- Hansard - - - Excerpts

My Lords, if I may just interject, I have seen this happen not just in the Horizon scandal. Several years ago, the banks were saying that you could not possibly find out someone’s PIN and were therefore refusing to refund people who had had stuff stolen from them. It was not until the late Professor Ross Anderson, of the computer science department at Cambridge University, proved that they had been deliberately misidentifying to the courts which counter they should have been looking at, as to what was being read, and explained exactly how you could get the thing to default back to a different set of counters, that the banks eventually had to give way. But they went on lying to the courts for a long time. I am afraid that this is something that keeps happening again and again, and an amendment like this is essential for future justice for innocent people.

--- Later in debate ---
Lord Russell of Liverpool Portrait The Deputy Chairman of Committees (Lord Russell of Liverpool) (CB)
- Hansard - - - Excerpts

My Lords, before we proceed, I draw to the attention of the Committee that we have a hard stop at 8.45 pm and we have committed to try to finish the Bill this evening. Could noble Lords please speak quickly and, if possible, concisely?

Lord Tarassenko Portrait Lord Tarassenko (CB)
- Hansard - -

My Lords, I support my noble friend Lady Kidron’s Amendment 211, to which I have put my name. I speak not as a technophobe but as a card-carrying technophile. I declare an interest as, for the past 15 years, I have been involved in the development of algorithms to analyse NHS data, mostly from acute NHS trusts. This is possible under current regulations, because all the research projects have received medical research ethics approval, and I hold an honorary contract with the local NHS trust.

This amendment is, in effect, designed to scale up existing provisions and make sure that they are applied to public sector data sources such as NHS data. By classifying such data as sovereign data assets, it would be possible to make it available not only to individual researchers but to industry—UK-based SMEs and pharmaceutical and big tech companies—under controlled conditions. One of these conditions, as indicated by proposed new subsection (6), is to require a business model where income is generated for the relevant UK government department from access fees paid by authorised licence holders. Each government department should ensure that the public sector data it transfers to the national data library is classified as a sovereign data asset, which can then be accessed securely through APIs acting

“as bridges between each sovereign data asset and the client software of the authorized licence holders”.

In the time available, I will consider the Department of Health and Social Care. The report of the Sudlow review, Uniting the UK’s Health Data: A Huge Opportunity for Society, published last month, sets out what could be achieved though linking multiple NHS data sources. The Academy of Medical Sciences has fully endorsed the report:

“The Sudlow recommendations can make the UK’s health data a truly national asset, improving both patient care and driving economic development”.


There is little difference, if any, between health data being “a truly national asset” and “a sovereign asset”.

Generative AI has the potential to extract clinical value from linked datasets in the various secure data environments within the NHS and to deliver a step change in patient care. It also has the potential to deliver economic value, as the application of AI models to these rich, multimodal datasets will lead to innovative software products being developed for early diagnosis and personalised treatment.

However, it seems that the rush to generate economic value is preceding the establishment of a transparent licensing system, as in proposed new subsection (3), and the setting up of a coherent business model, as in proposed new subsection (6). As my noble friend Lady Kidron pointed out, the provisions in this amendment are urgently needed, especially as the chief data and analytics officer at NHS England is reported as having said, at a recent event organised by the Health Service Journal and IBM, that the national federated data platform will soon be used to train different types of AI model. The two models mentioned in the speech were OpenAI’s proprietary ChatGPT model and Google’s medical AI, which is based on its proprietary large language model, Gemini. So, the patient data in the national federated data platform being built by Palantir, which is a US company, is, in effect, being made available to fine-tune large language models pretrained by OpenAI and Google—two big US tech companies.

As a recent editorial in the British Medical Journal argued:

“This risks leaving the NHS vulnerable to exploitation by private technology companies whose offers to ‘assist’ with infrastructure development could result in loss of control over valuable public assets”.


It is vital for the health of the UK public sector that there is no loss of control resulting from premature agreements with big tech companies. These US companies seek privileged access to highly valuable assets which consist of personal data collected from UK citizens. The Government must, as a high priority, determine the rules for access to these sovereign data assets along the lines outlined in this amendment. I urge the Minister to take on board both the aims and the practicalities of this amendment before any damaging loss of control.

Lord Freyberg Portrait Lord Freyberg (CB)
- Hansard - - - Excerpts

My Lords, I support Amendment 211 moved by my noble friend Lady Kidron, which builds on earlier contributions in this place made by the noble Lords, Lord Mitchell, Lord Stevenson, Lord Clement-Jones, and myself, as long ago as 2018, about the need to maximise the social, economic and environmental value that may be derived from personal data of national significance and, in particular, data controlled by our NHS.

The proposed definition of “sovereign data assets” is, in some sense, broad. However, the intent to recognise, protect and maximise their value in the public interest is readily inferred. The call for a transparent licensing regime to provide access to such assets and the mention of preferential access for individuals and organisations headquartered in the UK also make good sense, as the overarching aim is to build and maintain public trust in third-party data usage.

Crucially, I fully support provisions that would require the Secretary of State to report on the value and anticipated financial return from sovereign data assets. Identifying a public body that considered itself able or willing to guarantee value for money proved challenging when this topic was last explored. For too long, past Governments have dithered and delayed over the introduction of provisions that explicitly recognise the need to account for and safeguard the investment made by taxpayers in data held by public and arm’s-length institutions and associated data infrastructure—something that we do as a matter of course where the tangible assets that the National Audit Office monitors and reports on are concerned.

In recent weeks, the Chancellor of the Exchequer has emphasised the importance of recovering public funds “lost” during the Covid-19 pandemic. Yet this focus raises important questions about other potential revenue streams that were overlooked, particularly regarding NHS data assets. In 2019, Ernst & Young estimated that a curated NHS dataset could generate up to £5 billion annually for the UK while also delivering £4.6 billion in yearly patient benefits through improved data infrastructure. This begs the question: who is tracking whether these substantial economic and healthcare opportunities are being realised? Who is ensuring that these projected benefits—both financial and clinical—are actually flowing back into our healthcare system?

As we enter the age of AI, public discourse often fixates on potential risks while overlooking a crucial opportunity—namely, the rapidly increasing value of publicly controlled data and its potential to drive innovation and insights. This raises two crucial questions. First, how might we capitalise on the upside of this technological revolution to maximise the benefits on behalf of the public? Secondly, and more specifically, how will Parliament effectively scrutinise any eventual trade deal entered into with, for example, the United States of America, which might focus on a more limited digital chapter, in the absence of either an accepted valuation methodology or a transparent licensing system for use in providing access to valuable UK data assets?

Will the public, faced with a significant tax burden to improve public services and repeated reminders of the potential for data and technology to transform our NHS, trust the Government if they enable valuable digital assets to be stripped today only to be turned tomorrow into cutting-edge treatments that we can ill afford to purchase and that benefit companies paying taxes overseas? To my mind, there remains a very real risk that the UK, as my noble friend Lady Kidron, rightly stated, will inadvertently give away potentially valuable digital assets without there being appropriate safeguards in place. I therefore welcome the intent of Amendment 211 to put that right in the public interest.

Public Authority Algorithmic and Automated Decision-Making Systems Bill [HL]

Lord Tarassenko Excerpts
Lord Tarassenko Portrait Lord Tarassenko (CB)
- View Speech - Hansard - -

My Lords, it is a pleasure to follow the noble Baroness, Lady Lane-Fox. I agree with her points about implementation and upskilling the Civil Service. There is much that I want to say about automated decision-making, but I will focus on only one issue in the time available.

The draft Bill anticipates the spread of AI systems into ADM, with foundation models mentioned as components within the overall system. Large language models such as ChatGPT, which is probably the best-known example of a foundation model, typically operate non-deterministically. When generating the next word in a sequence, they sample from a probability distribution rather than always selecting the word with the highest probability. Therefore, ChatGPT will not always give the same response to the same query, as I am sure many noble Lords have discovered empirically.

Open AI introduced a setting in the API to its ChatGPT models last year to enable deterministic behaviour. However, there are other sources of non-determinism in the LLMs available from big-tech companies. Very slight changes in a query—for example, just in the punctuation or through the simple addition of the word “please” at the start—can have a major impact on the answer generated by the models.

The models are also regularly updated, and older versions are no longer supported. If any ADM system used by a public authority relies on a deprecated version of a closed-source proprietary AI system from a company such as Google or OpenAI, it will no longer be able to operate reproducibly. For example, when using ChatGPT, OpenAI’s newer GPT4 model will generate quite different outputs from GPT3.5 for the same input data.

I have given these brief examples of non-deterministic and non-reproducible behaviour to underline a very important point: the UK public sector will not be able to control the implementation or evolution of the hyperscale foundation models trained at great cost by US big tech companies. The training and updating of these models will be determined solely by the commercial interests of those companies, not by the requirements of the UK public sector.

To have complete control over training data, learning algorithms, system behaviour and software updates, the UK Government need to fund the development of a sovereign AI capability for public sector applications. This could be a set of tailor-made, medium-scale AI models, each developed by the relevant government department, possibly in partnership with universities or UK-based companies willing to disclose full details of algorithms and computer code. Only then will the behaviour of AI algorithms for ADM be transparent, deterministic and reproducible—requirements that should be built into legislation.

I welcome this Bill, but the implications of introducing AI models into ADM within the public sector need to be fully thought through. If we do not, we risk losing the trust of our fellow citizens in a technology that has the potential to deliver considerable benefits by speeding up and improving decision-making processes.