Data (Use and Access) Bill [Lords]

Preet Kaur Gill Excerpts
Wednesday 7th May 2025

(2 days, 9 hours ago)

Commons Chamber
Read Full debate Read Hansard Text Watch Debate Read Debate Ministerial Extracts
Caroline Dinenage Portrait Dame Caroline Dinenage
- Hansard - - - Excerpts

My right hon. Friend makes a very good observation, but the fact is that so much content has already been scraped. Crawlers are all over the intellectual property of so many of our creators, writers and publishers—so much so that we are almost in a position where we are shutting the gate after the horse has bolted. Nevertheless, we need to do what we can legislatively to get to a better place on this issue.

New clause 2 would simply require anyone operating web crawlers for training and developing AI models to comply with copyright law. It is self-evident and incontrovertible that AI developers looking to deploy their systems in the UK should comply with UK law, but they often claim that copyright is not very clear. I would argue that it is perfectly clear; it is just that sometimes they do not like it. It is a failure to abide by the law that is creating lawsuits around the world. The new clause would require all those marketing their AI models in the UK to abide by our gold-standard copyright regime, which is the basis that underpins our thriving creative industries.

New clause 3 would require web crawler operations and AI developers to disclose the who, what, why, and when crawlers are being used. It also requires them to use different crawlers for different purposes and to ensure that rights holders are not punished for blocking them. A joint hearing of the Culture, Media and Sport Committee and the Science, Innovation and Technology Committee heard how publishers are being targeted by thousands of web crawlers with the intention of scraping content to sell to AI developers. We heard that many, if not most, web crawlers are not abiding by current opt-out protocols—robots.txt, for example. To put it another way, some developers of large language models are buying data scraped by third-party tech companies, in contravention of robots.txt protocols, to evade accusations of foul play. All this does is undermine existing licensing and divert revenues that should be returning to our creative industries and news media sector. New clause 3 would provide transparency over who is scraping copyrighted works and give creators the ability to assert and enforce their rights.

New clause 4 would require AI developers to be transparent about what data is going into their AI models. Transparency is fundamental to this debate. It is what we should all be focusing on. We are already behind the drag curve on this. California has introduced transparency requirements, and no one can say that the developers are fleeing silicon valley just yet.

New clause 20, tabled by the official Opposition, also addresses transparency. It would protect the AI sector from legal action by enabling both sides to come to the table and get a fair deal. A core part of this new clause is the requirement on the Secretary of State to commit to a plan to help support creators where their copyright has been used in AI by requiring a degree of transparency.

New clause 5 would provide the means by which we could enforce the rules. It would give the Information Commissioner the power to investigate, assess and sanction bad actors. It would also entitle rights holders to recover damages for any losses suffered, and to injunctive relief. Part of the reason why rights holders are so concerned is that the vast majority of creators do not have deep enough pockets to take on AI developers. How can they take on billion-dollar big tech companies when those companies have the best lawyers that money can buy, who can bog cases down in legislation and red tape? Rights holders need a way of enforcing their rights that is accessible, practical and fair.

The Government’s AI and copyright consultation says that it wants to ensure

“a clear legal basis for AI training with copyright material”.

That is what the new clauses that I have spoken to would deliver. Together they refute the tech sector’s claims of legal uncertainty, while providing transparency and enforcement capabilities for creators.

Ultimately, transparency is the main barrier to greater collaboration between AI developers and creators. Notwithstanding some of the unambitious Government amendments, the Opposition’s amendments would provide the long-overdue redress to protect our creative industries by requiring transparency and a widening of the scope of those who are subject to copyright laws.

The amendments would protect our professional creators and journalists, preserve the pipeline of young people looking to make a career in these sectors themselves, and cement the UK as a genuine creative industries superpower, maintaining our advantage in the field of monetising intellectual property. One day we may make a commercial advantage out of the fact that we are the place where companies can set up ethical AI companies—we could be the envy of the world.

Preet Kaur Gill Portrait Preet Kaur Gill (Birmingham Edgbaston) (Lab/Co-op)
- View Speech - Hansard - -

I rise to support the Bill and speak to new clauses 22 and 23 tabled in my name. The measures in the Bill will unlock the power of data to grow the economy, to improve public services and make people’s lives easier. By modernising the way in which consumers and businesses can safely share data, the Bill will boost the economy by an estimated £10 billion over the next decade. The Bill will also make our public services more efficient and effective, saving our frontline workers from millions of hours of bureaucracy every year, which they can use to focus on keeping us safe and healthy.