Large Language Models and Generative AI (Communications and Digital Committee Report) Debate
Full Debate: Read Full DebateLord Tarassenko
Main Page: Lord Tarassenko (Crossbench - Life peer)Department Debates - View all Lord Tarassenko's debates with the Department for Science, Innovation & Technology
(1 month ago)
Lords ChamberMy Lords, I draw the House’s attention to my registered interests as a director of Oxehealth, a University of Oxford spin-out that uses AI for healthcare applications.
It is a great pleasure to follow the noble Lord, Lord Ranger, in this debate. Like him, in the time available I will speak mostly about the opportunities for the UK, more specifically in one sector. I congratulate the noble Baroness, Lady Stowell, on her excellent report, which she would probably like to know has been discussed positively in the common rooms in Oxford with the inquiry’s expert, Professor Mike Wooldridge.
It is not even 10 months since the report was published, but we already have new data points on the likely trajectories of large language models. The focus is shifting away from the pre-training of LLMs on ever-bigger datasets to what happens at inference time, when these models are used to generate answers to users’ queries. We are seeing the introduction of chain-of-reasoning techniques, for example in GPT-4o1, to encourage models to reason in a structured, logical and interpretable way. This approach may help users to understand the LLM’s reasoning process and increase their trust in the answers.
We are also seeing the emphasis shifting from text-only inputs into LLMs to multimodal inputs. Models are now being trained with images, videos and audio content; in fact, we should no longer call them large language models but large multimodal models—LMMs.
We are still awaiting the report on the AI opportunities action plan, written by Matt Clifford, the chair of ARIA, but we already know that the UK has some extraordinary datasets and a strong tradition of trusted governance, which together represent a unique opportunity for the application of generative AI.
The Sudlow review, Uniting the UK’s Health Data: A Huge Opportunity for Society, published two weeks ago tomorrow, hints at what could be achieved though linking multiple NHS data sources. The review stresses the need to recognize that national health data is part of the critical national infrastructure; we should go beyond this by identifying it as a sovereign data asset for the UK. As 98% of the 67 million UK citizens receive most of their healthcare from the NHS, this data is the most comprehensive large-scale healthcare dataset worldwide.
Generative AI has the potential to extract the full value from this unique, multimodal dataset and deliver a step change in disease prevention, diagnosis and treatment. To unlock insights from the UK’s health data, we need to build a sovereign AI capability that is pre-trained on the linked NHS datasets and does not rely on closed, proprietary models such as Open AI’s GPT-4 or Google’s Gemini, which are pre-trained on the entire content of the internet. This sovereign AI capability will be a suite of medium-scale sovereign LMMs, or HealthGPTs if you will, applied to different combinations of de-identified vital-sign data, laboratory tests, diagnostics tests, CT scans, MR scans, pathology images, discharge summaries and outcomes data, all available from the secure data environments—SDEs—currently being assembled within the NHS.
Linked datasets enable the learning of new knowledge within a large multimodal model; for example, an LMM pre-trained on linked digital pathology data and CT scans will be able to learn how different pathologies appear on those CT scans. Of course, very few patients will have a complete dataset, but generative AI algorithms can naturally handle the variability of each linked record. In addition, each LMM dataset can be augmented by text from medical textbooks, research papers and content from trusted websites such as those maintained by, for example, NHS England or Diabetes UK.
A simple example of such an LMM will help to illustrate the power of this approach for decision support—not decision-making—in healthcare. Imagine a patient turning up at her GP practice with a hard-to-diagnose autoimmune disease. With a description of her symptoms, together with the results of lab tests and her electronic patient record—EPR—data, DrugGPT, which is currently under development, will not only suggest a diagnosis to the GP but also recommend the right drugs and the appropriate dosage for that patient. It will also highlight any possible drug-drug interactions from knowledge of the patient’s existing medications in her EPR.
Of course, to build this sovereign LMM capability, the suite of HealthGPTs such as DrugGPT will require initial investment, but within five years such a capability should be self-funding. Access to any HealthGPT from an NHS log-in would be free; academic researchers, UK SMEs and multinationals would pay to access it through a suitable API, with a variable tariff according to the type of user. This income could be used to fund the HealthGPT lab, as well as the data wrangling and data curation activities of the teams maintaining the NHS’s secure data environments, with the surplus going to NHS trusts and GP practices in proportion to the amount of de-identified data which they will have supplied to pre-train the HealthGPTs.
For the general public to understand the value of the insights generated by these sovereign LMMs, a quarterly report on key insights would be sent out through the NHS app to all its 34 million users—three-quarters of the adult population in the UK—except to those who have opted out of having their data used for research.
The time is right to build a sovereign AI capability for health based on a suite of large multimodal models, which will improve the health of the nation, delivering more accurate diagnoses and better-targeted treatments while maximising the value of our NHS sovereign data asset. I hope the Minister will agree with me that this is an opportunity which the UK cannot afford to miss.