Large Language Models and Generative AI (Communications and Digital Committee Report) Debate

Full Debate: Read Full Debate
Department: Department for Science, Innovation & Technology

Large Language Models and Generative AI (Communications and Digital Committee Report)

Baroness Featherstone Excerpts
Thursday 21st November 2024

(1 month ago)

Lords Chamber
Read Full debate Read Hansard Text Watch Debate Read Debate Ministerial Extracts
Baroness Featherstone Portrait Baroness Featherstone (LD)
- View Speech - Hansard - -

My Lords, it is a great pleasure to follow the noble Lord, Lord Knight of Weymouth, and I pay tribute to the chair of the committee, the noble Baroness, Lady Stowell of Beeston, for her first-class chairing of what was a really complex issue to those of us who are not in the AI or tech industries. It was completely fascinating and very eye-opening—a masterclass.

Today I want to address one of the most pressing and critical issues raised by the noble Baroness, Lady Stowell: the clear evidence that creatives and their living are in great danger. They are up against the overwhelming and overweening power of the big tech companies and what appeared to be a great deal of reluctance by the industry to concede that a way to remunerate for intellectual property use was vital. It was clear that the LLM industry is using the products of the creative industries to train AI and were text and data mining extensively for their own benefit without paying for it. As I listened to a cascade of excuses and rationales for not dealing with the issue, it was a real-life example of killing the goose that laid the golden egg.

At its most basic, it is critical that we recognise original work, deliver fair compensation, and respect creators’ rights and economic justice. We listened to all the arguments about who owns what, how you prove it, how it is all too difficult, that it is like reading a book or that somehow it is a public good. But in the end, creatives must be recompensed for the use of their creations. We need to ensure a sustainable creative economy. As the noble Baroness said, the creative industries are a massive economic driver for our national economy.

There is both a legal and an ethical responsibility to ensure that there is adherence to copyright laws. Those laws exist to protect the work of creators. As this field develops, and AI becomes more integrated into industries, it is a critical requirement and ethical responsibility of companies to respect intellectual property. It was clear from the evidence we heard that much of the data mining that has been going on has taken place without any permission from or compensation to the rights holders. Yes, there were esoteric discussions as to where copyright belonged: could it really be the original artist when somewhere in a black box—or maybe it was a sandbox, I cannot remember—fibres were creating something anew from the feed? That may be challenging, but the onus is on the AI industry and the Government to protect our creatives. As a group, and given their talents, they are not always paid well anyway. For them not to receive anything, when their work provides the basis for AI training for an industry that is going to grow wildly economically rich, is simply not acceptable.

Our copyright law is absolutely clear on this. Moreover, the evidence given to the committee, such as from the Society of Authors, noted that AI systems “would simply collapse” if they could not access original talent. It was equally clear from Dan Conway, CEO of the Publishers Association, in his evidence to the committee, that LLMs

“are infringing copyrighted content on an absolutely massive scale … when they collect the information”

and in

“how they store the information and how they handle it”.

There was clear evidence from model outputs that developers had used pirated content from the Books3 database, and he alleged that they were “not currently compliant” with UK law. Microsoft countered with the argument that, basically, they were offering a public good and therefore copyright laws should not apply to ideas—good try.

I was also interested to receive a briefing from UK Music, which is concerned—justly, in my view—that the Government might try to introduce further text and data mining copyright exceptions, which would allow AI service providers to train their systems on music without the consent of, or need to compensate, its creators. The oft-made suggestion, as raised by the noble Baroness, is an opt-out system. It seems relatively practical: you could opt in if you did not mind your stuff being used, or you could opt out. But it will not work. There are no existing, effective opt-out schemes that reliably opt out content from training. Doing so is quite impossible. There is no way to have control over whether downstream uses of original work are opted out of generative AI training, since there is no control for the artist over the URLs where they are hosted—perhaps we should look at extraterritorial law. The evidence suggests that the majority of people who have the option to opt out of generative AI training do not even realise that they have the option. Moreover, if opt-out schemes are adopted, publishers and copyright holders will have only the illusion of choice. If they opt out of AI training, they opt out of being findable on the internet altogether.

Record keeping has also been suggested—I do not think the committee covered this, but I stand to be corrected. Currently there is no stand-alone legal requirement in the UK to disclose the material that AI systems are trained on, beyond the data protection law framework. I believe that record keeping should be mandatory.

AI cannot create in a vacuum. It needs huge data sets, so often drawn from copyrighted materials, to function. Clearly, it would be much better to encourage collaboration between the tech industry and the creative industries, instead of AI becoming a threat or being a threat, as it is. I implore AI companies to accept this thesis and ensure that they are transparent about how their models are trained and which data is used.

There are a lot of ideas around about group licensing and so on. It would be far more productive if the LLMs worked with the creatives. A lot of creatives are individuals or small companies. They just do not have the means to enforce their IP rights through the legal process or to track how their works are being used in AI training. That is why the committee’s recommendation that the IPO code must ensure that creators are fully empowered to exercise their rights is so important, alongside the requirement for developers to make clear whether their web crawlers are being used to acquire data for generative AI training or for other purposes.

Ultimately, AI’s integration into the creative industries brings a host of economic, ethical and legal challenges, but the most essential part is protecting the rights of creators to ensure fairness in the distribution of economic value, so that creators and the AI industry can both thrive. I trust the Government will ensure that the committee’s recommendations are implemented in full.