(1 month ago)
Grand CommitteeMy Lords, I will speak briefly in support of this amendment. Anyone who has written computer code, and I plead guilty, knows that large software systems are never bug-free. These bugs can arise because of software design errors, human errors in coding or unexpected software interactions for some input data. Every computer scientist or software engineer will readily acknowledge that computer systems have a latent propensity to function incorrectly.
As the noble Baroness, Lady Kidron, has already said, we all regularly experience the phenomenon of bug fixing when we download updates to software products in everyday use—for example, Office 365. These updates include not only new features but patches to fix bugs which have become apparent only in the current version of the software. The legal presumption of the proper functioning of “mechanical instruments” that courts in England and Wales have been applying to computers since 1999 has been shown by the Post Office Horizon IT inquiry to be deeply flawed. The more complex the program, the more likely the occurrences of incorrect functioning, even with modular design. The program at the heart of Fujitsu’s Horizon IT system had tens of millions of lines of code.
The unwillingness of the courts to accept that the Horizon IT system developed for the Post Office was unreliable and lacking in robustness—until the key judgment, which has already been mentioned, by Mr Justice Fraser in 2019—is one of the main reasons why more than 900 sub-postmasters were wrongly prosecuted. The error logs of any computer system make it possible to identify unexpected states in the computer software and hence erroneous system behaviour. Error logs for the Horizon IT system were disclosed only in response to a direction from the court in early 2019. At that point, the records from Fujitsu’s browser-based incident management system revealed 218,000 different error records for the Horizon system.
For 18 years prior to 2019, the Post Office did not disclose any error log data, documents which are routinely maintained and kept for any computer system of any size and complexity. Existing disclosure arrangements in legal proceedings do not work effectively for computer software, and this amendment concerning the electronic evidence produced by or derived from a computer system seeks to address this issue. The Post Office Horizon IT inquiry finished hearing evidence yesterday, having catalogued a human tragedy of unparalleled scale, one of the most widespread miscarriages of justice in the UK. Whether it is by means of this amendment or otherwise, wrongful prosecutions on the basis that computers always operate properly cannot continue any longer.
My Lords, if I may just interject, I have seen this happen not just in the Horizon scandal. Several years ago, the banks were saying that you could not possibly find out someone’s PIN and were therefore refusing to refund people who had had stuff stolen from them. It was not until the late Professor Ross Anderson, of the computer science department at Cambridge University, proved that they had been deliberately misidentifying to the courts which counter they should have been looking at, as to what was being read, and explained exactly how you could get the thing to default back to a different set of counters, that the banks eventually had to give way. But they went on lying to the courts for a long time. I am afraid that this is something that keeps happening again and again, and an amendment like this is essential for future justice for innocent people.
My Lords, before we proceed, I draw to the attention of the Committee that we have a hard stop at 8.45 pm and we have committed to try to finish the Bill this evening. Could noble Lords please speak quickly and, if possible, concisely?
My Lords, I support my noble friend Lady Kidron’s Amendment 211, to which I have put my name. I speak not as a technophobe but as a card-carrying technophile. I declare an interest as, for the past 15 years, I have been involved in the development of algorithms to analyse NHS data, mostly from acute NHS trusts. This is possible under current regulations, because all the research projects have received medical research ethics approval, and I hold an honorary contract with the local NHS trust.
This amendment is, in effect, designed to scale up existing provisions and make sure that they are applied to public sector data sources such as NHS data. By classifying such data as sovereign data assets, it would be possible to make it available not only to individual researchers but to industry—UK-based SMEs and pharmaceutical and big tech companies—under controlled conditions. One of these conditions, as indicated by proposed new subsection (6), is to require a business model where income is generated for the relevant UK government department from access fees paid by authorised licence holders. Each government department should ensure that the public sector data it transfers to the national data library is classified as a sovereign data asset, which can then be accessed securely through APIs acting
“as bridges between each sovereign data asset and the client software of the authorized licence holders”.
In the time available, I will consider the Department of Health and Social Care. The report of the Sudlow review, Uniting the UK’s Health Data: A Huge Opportunity for Society, published last month, sets out what could be achieved though linking multiple NHS data sources. The Academy of Medical Sciences has fully endorsed the report:
“The Sudlow recommendations can make the UK’s health data a truly national asset, improving both patient care and driving economic development”.
There is little difference, if any, between health data being “a truly national asset” and “a sovereign asset”.
Generative AI has the potential to extract clinical value from linked datasets in the various secure data environments within the NHS and to deliver a step change in patient care. It also has the potential to deliver economic value, as the application of AI models to these rich, multimodal datasets will lead to innovative software products being developed for early diagnosis and personalised treatment.
However, it seems that the rush to generate economic value is preceding the establishment of a transparent licensing system, as in proposed new subsection (3), and the setting up of a coherent business model, as in proposed new subsection (6). As my noble friend Lady Kidron pointed out, the provisions in this amendment are urgently needed, especially as the chief data and analytics officer at NHS England is reported as having said, at a recent event organised by the Health Service Journal and IBM, that the national federated data platform will soon be used to train different types of AI model. The two models mentioned in the speech were OpenAI’s proprietary ChatGPT model and Google’s medical AI, which is based on its proprietary large language model, Gemini. So, the patient data in the national federated data platform being built by Palantir, which is a US company, is, in effect, being made available to fine-tune large language models pretrained by OpenAI and Google—two big US tech companies.
As a recent editorial in the British Medical Journal argued:
“This risks leaving the NHS vulnerable to exploitation by private technology companies whose offers to ‘assist’ with infrastructure development could result in loss of control over valuable public assets”.
It is vital for the health of the UK public sector that there is no loss of control resulting from premature agreements with big tech companies. These US companies seek privileged access to highly valuable assets which consist of personal data collected from UK citizens. The Government must, as a high priority, determine the rules for access to these sovereign data assets along the lines outlined in this amendment. I urge the Minister to take on board both the aims and the practicalities of this amendment before any damaging loss of control.
My Lords, I support Amendment 211 moved by my noble friend Lady Kidron, which builds on earlier contributions in this place made by the noble Lords, Lord Mitchell, Lord Stevenson, Lord Clement-Jones, and myself, as long ago as 2018, about the need to maximise the social, economic and environmental value that may be derived from personal data of national significance and, in particular, data controlled by our NHS.
The proposed definition of “sovereign data assets” is, in some sense, broad. However, the intent to recognise, protect and maximise their value in the public interest is readily inferred. The call for a transparent licensing regime to provide access to such assets and the mention of preferential access for individuals and organisations headquartered in the UK also make good sense, as the overarching aim is to build and maintain public trust in third-party data usage.
Crucially, I fully support provisions that would require the Secretary of State to report on the value and anticipated financial return from sovereign data assets. Identifying a public body that considered itself able or willing to guarantee value for money proved challenging when this topic was last explored. For too long, past Governments have dithered and delayed over the introduction of provisions that explicitly recognise the need to account for and safeguard the investment made by taxpayers in data held by public and arm’s-length institutions and associated data infrastructure—something that we do as a matter of course where the tangible assets that the National Audit Office monitors and reports on are concerned.
In recent weeks, the Chancellor of the Exchequer has emphasised the importance of recovering public funds “lost” during the Covid-19 pandemic. Yet this focus raises important questions about other potential revenue streams that were overlooked, particularly regarding NHS data assets. In 2019, Ernst & Young estimated that a curated NHS dataset could generate up to £5 billion annually for the UK while also delivering £4.6 billion in yearly patient benefits through improved data infrastructure. This begs the question: who is tracking whether these substantial economic and healthcare opportunities are being realised? Who is ensuring that these projected benefits—both financial and clinical—are actually flowing back into our healthcare system?
As we enter the age of AI, public discourse often fixates on potential risks while overlooking a crucial opportunity—namely, the rapidly increasing value of publicly controlled data and its potential to drive innovation and insights. This raises two crucial questions. First, how might we capitalise on the upside of this technological revolution to maximise the benefits on behalf of the public? Secondly, and more specifically, how will Parliament effectively scrutinise any eventual trade deal entered into with, for example, the United States of America, which might focus on a more limited digital chapter, in the absence of either an accepted valuation methodology or a transparent licensing system for use in providing access to valuable UK data assets?
Will the public, faced with a significant tax burden to improve public services and repeated reminders of the potential for data and technology to transform our NHS, trust the Government if they enable valuable digital assets to be stripped today only to be turned tomorrow into cutting-edge treatments that we can ill afford to purchase and that benefit companies paying taxes overseas? To my mind, there remains a very real risk that the UK, as my noble friend Lady Kidron, rightly stated, will inadvertently give away potentially valuable digital assets without there being appropriate safeguards in place. I therefore welcome the intent of Amendment 211 to put that right in the public interest.
(1 month, 1 week ago)
Lords ChamberMy Lords, it is a pleasure to follow the noble Baroness, Lady Lane-Fox. I agree with her points about implementation and upskilling the Civil Service. There is much that I want to say about automated decision-making, but I will focus on only one issue in the time available.
The draft Bill anticipates the spread of AI systems into ADM, with foundation models mentioned as components within the overall system. Large language models such as ChatGPT, which is probably the best-known example of a foundation model, typically operate non-deterministically. When generating the next word in a sequence, they sample from a probability distribution rather than always selecting the word with the highest probability. Therefore, ChatGPT will not always give the same response to the same query, as I am sure many noble Lords have discovered empirically.
Open AI introduced a setting in the API to its ChatGPT models last year to enable deterministic behaviour. However, there are other sources of non-determinism in the LLMs available from big-tech companies. Very slight changes in a query—for example, just in the punctuation or through the simple addition of the word “please” at the start—can have a major impact on the answer generated by the models.
The models are also regularly updated, and older versions are no longer supported. If any ADM system used by a public authority relies on a deprecated version of a closed-source proprietary AI system from a company such as Google or OpenAI, it will no longer be able to operate reproducibly. For example, when using ChatGPT, OpenAI’s newer GPT4 model will generate quite different outputs from GPT3.5 for the same input data.
I have given these brief examples of non-deterministic and non-reproducible behaviour to underline a very important point: the UK public sector will not be able to control the implementation or evolution of the hyperscale foundation models trained at great cost by US big tech companies. The training and updating of these models will be determined solely by the commercial interests of those companies, not by the requirements of the UK public sector.
To have complete control over training data, learning algorithms, system behaviour and software updates, the UK Government need to fund the development of a sovereign AI capability for public sector applications. This could be a set of tailor-made, medium-scale AI models, each developed by the relevant government department, possibly in partnership with universities or UK-based companies willing to disclose full details of algorithms and computer code. Only then will the behaviour of AI algorithms for ADM be transparent, deterministic and reproducible—requirements that should be built into legislation.
I welcome this Bill, but the implications of introducing AI models into ADM within the public sector need to be fully thought through. If we do not, we risk losing the trust of our fellow citizens in a technology that has the potential to deliver considerable benefits by speeding up and improving decision-making processes.