(3 days, 9 hours ago)
Lords ChamberMy Lords, I very much encourage the Government to go down this road. Everyone talks about the NHS just because the data is there and organised. If we establish a structure like this, there are other sources of data that we could develop to equivalent value. Education is the obvious one. What works in education? We have huge amounts of data, but we do nothing with it—both in schools and in higher education. What is happening to biodiversity? We do not presently collect the data or use it in the way we could, but if we had that, and if we took advantage of all the people who would be willing to help with that, we would end up with a hugely valuable national resource.
HMRC has a lot of information about employment and career patterns, none of which we use. We worry about what is happening and how we can improve seaside communities, but we do not collect the data which would enable us to do it. We could become a data-based society. This data needs guarding because it is not for general use—it is for our use, and this sort of structure seems a really good way of doing it. It is not just the NHS—there is a whole range of areas in which we could greatly benefit the UK.
My Lords, all our speakers have made it clear that this is a here-and-now issue. The context has been set out by noble Lords, whether it is Stargate, the AI Opportunities Action Plan or, indeed, the Palantir contract with the NHS. This has been coming down the track for some years. There are Members on the Government Benches, such as the noble Lords, Lord Mitchell and Lord Hunt of Kings Heath, who have been telling us that we need to work out a fair way of deriving a proper financial return for the benefits of public data assets, and Future Care Capital has done likewise. The noble Lord, Lord Freyberg, has form in this area as well.
The Government’s plan for the national data library and the concept of sovereign data assets raises crucial questions about how to balance the potential benefits of data sharing with the need to protect individual rights, maintain public trust and make sure that we achieve proper value for our public digital assets. I know that the Minister has a particular interest in this area, and I hope he will carry forward the work, even if this amendment does not go through.
(1 month, 1 week ago)
Grand CommitteeMy Lords, I very much support these amendments. I declare an interest as an owner of written copyright in the Good Schools Guide and as a father of an illustrator. In both contexts, it is very important that we get intellectual property right, as I think the Government recognised in what they put out yesterday. However, I share the scepticism of those who have spoken as to whether the Government’s ideas can be made to work.
It is really important that we get this straight. For those of us operating at the small end of the scale, IP is under continual threat from established media. I write maybe 10 or a dozen letters a year to large media outfits reminding them of the borders, the latest to the Catholic Herald—it appears not even the 10 commandments have force on them. But what AI can do is a huge measure more difficult to deal with. I can absolutely see, by talking to Copilot, that it has gone through my paywall and absorbed the contents of the Good Schools Guide, but who am I supposed to go at for this? Who has actually done the trespassing? Who is responsible for it? Where is the ownership? It is difficult to enforce copyright, even by writing a polite letter to someone saying, “Please don’t do this”. The Government appear to propose a system of polite letters saying, “Oh dear, it looks as if you might have borrowed my copyright. Please, can you give it back?”
This is not practically enforceable, and it will not result in people who care about IP locating their businesses here. Quite clearly, we do not have ownership of the big AI systems, and it is unlikely that we will have ownership of them—all that will be overseas. What we can do is create IP. If we produce a system where we do not defend the IP that we produce, then fairly rapidly, those IP creators who are capable of being mobile will go elsewhere to places that will defend their IP. It is something that a Government who are interested in growth really ought to be interested in defending. I hope that we will see some real progress in the course of the Bill going through the House.
My Lords, I declare my AI interests as set out in the register. I will speak in support of Amendments 204, 205 and 206, which have been spoken to so inspiringly by the noble Baroness, Lady Kidron, and so well by the noble Lords, Lord Freyberg, Lord Lucas and Lord Hampton, the noble Earl, Lord Clancarty, and the noble Viscount, Lord Colville. Each demonstrated different facets of the issue.
I co-chair the All-Party Group on AI and chaired the AI Select Committee a few years ago. I wrote a book earlier this year on AI regulation, which had a namecheck from the noble Baroness, Lady Jones, at Question Time, which I was very grateful for. Before that, I had a career as an IP lawyer, defending copyright and creativity, and in this House, I have been my party’s creative industries spokesperson. The question of IP and the training of generative AI models is a key issue for me.
This is the case not just in the UK but around the world. Getty and the New York Times are suing in the United States, as are many writers, artists and musicians. It was at the root of the Hollywood actors’ and writers’ strikes last year. It is one thing to use the tech—many of us are AI enthusiasts—but it is another to be at the mercy of it.
Close to home, the FT has pointed out, using the index published by the creator of an unlicensed dataset called Books3, published online, that it is possible to identify that over 85 books written by 33 Members of the House of Lords have been pirated to train AI models from household names, such as Meta, Microsoft and Bloomberg. Although it is absolutely clear that we know that the use of copyrighted works to train AI models is contrary to UK copyright law, the laws around the transparency of these activities have not caught up. As we have heard, as well as using pirated e-books in their training data, AI developers scrape the internet for valuable professional journalism and other media, in breach of both the terms of service of websites and copyright law, to train commercial AI models. At present, developers can do this without declaring their identity, or they may use IP scraped to appear in a search index for the completely different commercial purpose of training AI models.
How can rights owners opt out of something that they do not know about? AI developers will often scrape websites or access other pirated material before they launch an LLM in public. This means that there is no way for IP owners to opt out of their material being taken before its inclusion in these models. Once used to train these models, the commercial value, as we have heard, has already been extracted from IP scraped without permission, with no way to delete data from these models.
The next wave of AI models responds to user queries by browsing the web to extract valuable news and information from professional news websites. This is known as retrieval-augmented generation—RAG. Without payment for extracting this commercial value, AI agents built by companies such as Perplexity, Google and Meta will, in effect, free-ride on the professional hard work of journalists, authors and creators. At present, such crawlers are hard to block. There is no market failure; there are well-established licensing solutions. There is no uncertainty around the existing law; the UK is absolutely clear that commercial organisations, including gen AI developers, must license the data that they use to train their large language models.
Here, as the Government’s intentions become clearer, the political, business and creative temperature is rising. Just this week, we have seen the creation of a new campaign, the Creative Rights in AI Coalition—CRAIC —across the creative and news industries and, recently, Ed Newton-Rex reached more than 30,000 signatories from among creators and creative organisations.
My Lords, having a system such as this would really focus the public sector on how we can generate more datasets. As I said earlier, education is an obvious one, but so is mobile phone data. All these companies have their licences. If a condition of the licence was that the data on how people move around the UK became a public asset, that would be hugely beneficial to policy formation. If we really understood how, why and when people move, we would make much better decisions. We could save ourselves huge amounts of money. We really ought to have this as a deep focus of government policy.
My Lords, I have far too little time to do justice to this subject. We on these Benches welcome this amendment. It is entirely consistent with the sovereign health fund proposed by Future Care Capital and, indeed, with the proposals from the Tony Blair Institute for Global Change on a similar concept called the national data trust. Indeed, this concept formed part of our Liberal Democrat manifesto at the last general election, so of course I support the amendment.
It would be very useful to hear more about the national data library, including on its purpose and operation, as the noble Baroness, Lady Kidron, said. I entirely agree with her that there is a great need for a sovereign cloud service or services. Indeed, the inability to guarantee that data on the cloud is held in this country is a real issue that has not yet been properly addressed.