Debates between Lord Clement-Jones and Lord Lucas during the 2024 Parliament

Wed 18th Dec 2024

Data (Use and Access) Bill [HL]

Debate between Lord Clement-Jones and Lord Lucas
Lord Lucas Portrait Lord Lucas (Con)
- Hansard - - - Excerpts

My Lords, I very much support these amendments. I declare an interest as an owner of written copyright in the Good Schools Guide and as a father of an illustrator. In both contexts, it is very important that we get intellectual property right, as I think the Government recognised in what they put out yesterday. However, I share the scepticism of those who have spoken as to whether the Government’s ideas can be made to work.

It is really important that we get this straight. For those of us operating at the small end of the scale, IP is under continual threat from established media. I write maybe 10 or a dozen letters a year to large media outfits reminding them of the borders, the latest to the Catholic Herald—it appears not even the 10 commandments have force on them. But what AI can do is a huge measure more difficult to deal with. I can absolutely see, by talking to Copilot, that it has gone through my paywall and absorbed the contents of the Good Schools Guide, but who am I supposed to go at for this? Who has actually done the trespassing? Who is responsible for it? Where is the ownership? It is difficult to enforce copyright, even by writing a polite letter to someone saying, “Please don’t do this”. The Government appear to propose a system of polite letters saying, “Oh dear, it looks as if you might have borrowed my copyright. Please, can you give it back?”

This is not practically enforceable, and it will not result in people who care about IP locating their businesses here. Quite clearly, we do not have ownership of the big AI systems, and it is unlikely that we will have ownership of them—all that will be overseas. What we can do is create IP. If we produce a system where we do not defend the IP that we produce, then fairly rapidly, those IP creators who are capable of being mobile will go elsewhere to places that will defend their IP. It is something that a Government who are interested in growth really ought to be interested in defending. I hope that we will see some real progress in the course of the Bill going through the House.

Lord Clement-Jones Portrait Lord Clement-Jones (LD)
- Hansard - -

My Lords, I declare my AI interests as set out in the register. I will speak in support of Amendments 204, 205 and 206, which have been spoken to so inspiringly by the noble Baroness, Lady Kidron, and so well by the noble Lords, Lord Freyberg, Lord Lucas and Lord Hampton, the noble Earl, Lord Clancarty, and the noble Viscount, Lord Colville. Each demonstrated different facets of the issue.

I co-chair the All-Party Group on AI and chaired the AI Select Committee a few years ago. I wrote a book earlier this year on AI regulation, which had a namecheck from the noble Baroness, Lady Jones, at Question Time, which I was very grateful for. Before that, I had a career as an IP lawyer, defending copyright and creativity, and in this House, I have been my party’s creative industries spokesperson. The question of IP and the training of generative AI models is a key issue for me.

This is the case not just in the UK but around the world. Getty and the New York Times are suing in the United States, as are many writers, artists and musicians. It was at the root of the Hollywood actors’ and writers’ strikes last year. It is one thing to use the tech—many of us are AI enthusiasts—but it is another to be at the mercy of it.

Close to home, the FT has pointed out, using the index published by the creator of an unlicensed dataset called Books3, published online, that it is possible to identify that over 85 books written by 33 Members of the House of Lords have been pirated to train AI models from household names, such as Meta, Microsoft and Bloomberg. Although it is absolutely clear that we know that the use of copyrighted works to train AI models is contrary to UK copyright law, the laws around the transparency of these activities have not caught up. As we have heard, as well as using pirated e-books in their training data, AI developers scrape the internet for valuable professional journalism and other media, in breach of both the terms of service of websites and copyright law, to train commercial AI models. At present, developers can do this without declaring their identity, or they may use IP scraped to appear in a search index for the completely different commercial purpose of training AI models.

How can rights owners opt out of something that they do not know about? AI developers will often scrape websites or access other pirated material before they launch an LLM in public. This means that there is no way for IP owners to opt out of their material being taken before its inclusion in these models. Once used to train these models, the commercial value, as we have heard, has already been extracted from IP scraped without permission, with no way to delete data from these models.

The next wave of AI models responds to user queries by browsing the web to extract valuable news and information from professional news websites. This is known as retrieval-augmented generation—RAG. Without payment for extracting this commercial value, AI agents built by companies such as Perplexity, Google and Meta will, in effect, free-ride on the professional hard work of journalists, authors and creators. At present, such crawlers are hard to block. There is no market failure; there are well-established licensing solutions. There is no uncertainty around the existing law; the UK is absolutely clear that commercial organisations, including gen AI developers, must license the data that they use to train their large language models.

Here, as the Government’s intentions become clearer, the political, business and creative temperature is rising. Just this week, we have seen the creation of a new campaign, the Creative Rights in AI Coalition—CRAIC —across the creative and news industries and, recently, Ed Newton-Rex reached more than 30,000 signatories from among creators and creative organisations.

--- Later in debate ---
Lord Lucas Portrait Lord Lucas (Con)
- Hansard - - - Excerpts

My Lords, having a system such as this would really focus the public sector on how we can generate more datasets. As I said earlier, education is an obvious one, but so is mobile phone data. All these companies have their licences. If a condition of the licence was that the data on how people move around the UK became a public asset, that would be hugely beneficial to policy formation. If we really understood how, why and when people move, we would make much better decisions. We could save ourselves huge amounts of money. We really ought to have this as a deep focus of government policy.

Lord Clement-Jones Portrait Lord Clement-Jones (LD)
- Hansard - -

My Lords, I have far too little time to do justice to this subject. We on these Benches welcome this amendment. It is entirely consistent with the sovereign health fund proposed by Future Care Capital and, indeed, with the proposals from the Tony Blair Institute for Global Change on a similar concept called the national data trust. Indeed, this concept formed part of our Liberal Democrat manifesto at the last general election, so of course I support the amendment.

It would be very useful to hear more about the national data library, including on its purpose and operation, as the noble Baroness, Lady Kidron, said. I entirely agree with her that there is a great need for a sovereign cloud service or services. Indeed, the inability to guarantee that data on the cloud is held in this country is a real issue that has not yet been properly addressed.