Data (Use and Access) Bill [HL] Debate
Full Debate: Read Full DebateEarl of Clancarty
Main Page: Earl of Clancarty (Crossbench - Excepted Hereditary)Department Debates - View all Earl of Clancarty's debates with the Department for Business and Trade
(1 day, 17 hours ago)
Grand CommitteeMy Lords, I support Amendments 204, 205 and 206, to which I have attached my name. In doing so, I declare my interest as someone with a long-standing background in the visual arts and as an artist member of the Design and Artists Copyright Society.
These amendments, tabled and superbly moved by my noble friend and supported by the noble Lords, Lord Stevenson and Lord Clement-Jones, seek to address a deep crisis in the creative sector whereby millions upon millions of creative works have been used to train general-purpose or generative AI models without permission or pay. While access to data is a fundamental aspect of this Bill, which in many cases has positive and legitimate aims, the unauthorised scraping of copyright-protected artworks, news stories, books and so forth for the use of generative AI models has significant downstream impacts. It affects the creative sectors’ ability to grow economically, to maximise their valuable assets and to retain the authenticity that the public rely on.
AI companies have used artists’ works in the training, development and deployment of AI systems without consent, despite this being a requirement under UK copyright law. As has been said, the narrow exception to copyright for text and data mining for specific research purposes does not extend to AI models, which have indiscriminately scraped creative content such as images without permission, simply to build commercial products that allow users to generate their own versions of a Picasso or a David Hockney work.
This amendment would clarify the steps that operators of web crawlers and general-purpose AI models must take to comply with UK copyright law. It represents a significant step forward in resolving the legal challenges brought by rights holders against AI companies over their training practices. Despite high-profile cases arising in the USA and the UK over unauthorised uses of content by AI companies, the reality is that individual artists simply cannot access judicial redress, given the prohibitive cost of litigation.
DACS, which represents artists’ copyright, surveyed its members and found that they were not technophobic or against AI in principle but that their concerns lay with the legality and ethics of current AI operators. In fact, 84% of respondents would sign up for a licensing mechanism to be paid when their work is used by an AI with their consent. This amendment would clarify that remuneration is owed for AI companies’ use of artists’ works across the entire development life cycle, including during the pre-training and fine-tuning stages.
Licensing would additionally create the legal certainty needed for AI companies to develop their products in the UK, as the unlawful use of works creates a litigation risk which deters investment, especially from SMEs that cannot afford litigation. DACS has also been informed by its members that commissioning clients have requested artists not to use AI products in order to avoid liability issues around its input and output, demonstrating a lack of trust or uncertainty about using AI.
This amendment would additionally settle ongoing arguments around whether compliance with UK copyright law is required where AI training takes place in other jurisdictions. By affirming its applicability where AI products are marketed in the UK, the amendment would ensure that both UK-based artists and AI companies are not put at a competitive disadvantage due to international firms’ ability to conduct training in a different jurisdiction.
One of the barriers to licensing copyright is the lack of transparency over what works have been scraped by AI companies. The third amendment in this suite of proposals, Amendment 206, seeks to address this. It would require operators of web crawlers and general-purpose AI models to be transparent about the copyright works they have scraped.
Currently, artists and creators face significant challenges in protecting their intellectual property rights in the age of AI. While tools such as Spawning AI’s “Have I Been Trained?” attempt to help creators identify whether their work has been used in AI training datasets, these initiatives provide only surface-level information. Creators may learn that their work was included in training data, but they remain in the dark about crucial details—specifically, how their work was used and which companies used it. This deeper level of transparency is essential for artists to enforce their IP rights effectively. Unfortunately, the current documentation provided by AI companies, such as data cards and model cards, falls short of delivering this necessary transparency, leaving creators without the practical means to protect their work.
Amendment 206 addresses the well-known black box issue that currently plagues the AI market, by requiring the disclosure of information about the URLs accessed by internet scrapers, information that can be used to identify individual works, the timeframe of data collection and the type of data collected, among other things. The US Midjourney litigation is a prime example of why this is necessary for UK copyright enforcement. It was initiated only after a leak revealed the names of more than 16,000 non-consenting artists whose works were allegedly used to train the tool.
Creators, including artists, should not find themselves in a position where they must rely on leaks to defend their intellectual property rights. By requiring AI companies to regularly update their own records, detailing what works were used in the training process and providing this to rights holders on request, this amendment could also create a vital cultural shift towards accountability. This would represent an important step away from the “Move fast and break things” culture pervasive amongst the Silicon Valley-based AI companies at the forefront of AI development, and a step towards preserving the gold-standard British IP framework.
Lastly, I address Amendment 205, which requires operators of internet crawlers and general-purpose AI models to be transparent about the identity and purpose of their crawlers, and not penalise copyright holders who choose to deny scraping for AI by down ranking their content in, or removing their content from, a search engine. Operators of internet crawlers that scrape artistic works and other copyright-protected content can obscure their identity, making it difficult and time-consuming for individual artists and the entities that represent their copyright interests to identify these uses and seek redress for illegal scraping.
Inclusion in search-engine results is crucial for visual artists, who rely on the visibility these provide for their work to build their reputation and client base and generate sales. At present, web operators that choose to deny scraping by internet crawlers risk the downrating or even removal of their content from search engines, as the most commonly used tools cannot distinguish between do-not-train protocols added to a site. This amendment will ensure that artists who choose to deny scraping for AI training are not disadvantaged by current technical restrictions and lose out on the exposure generated by search engines.
Finally, I will say a few words about the Government’s consultation launched yesterday, because it exposes a deeply troubling approach to creators’ IP rights, as has already been said so eloquently by the noble Baroness. For months, we have been urged to trust the Government to find the right balance between creators’ rights and AI innovation, yet their concept of balance has now been revealed for what it truly is: an incredibly unfair trade-off that gives away the rights of hundreds of thousands of creators to AI firms in exchange for vague promises of transparency.
Their proposal is built on a fundamentally flawed premise—promoted by tech lobbyists—that there is a lack of clarity in existing copyright law. This is completely untrue: the use of copyrighted content by AI companies without a licence is theft on a mass scale, as has already been said, and there is no objective case for the new text and data-mining exception. What we find in this consultation is a cynical rebranding of the opt-out mechanism as a rights reservation system. While they are positioning this as beneficial for rights holders through potential licensing revenues, the reality is that this is not achievable, yet the Government intend to leave it to Ministers alone to determine what constitutes
“effective, accessible, and widely adopted”
protection measures.
This is deeply concerning, given that no truly feasible rights reservation system for AI has been implemented anywhere in the world. Rights holders have been unequivocal: opt-out mechanisms—whatever the name they are given—are fundamentally unworkable in practice. In today’s digital world, where content can be instantly shared by anyone, creators are left powerless to protect their work. This hits visual artists particularly hard, as they must make their work visible to earn a living.
The evidence from Europe serves as a stark warning: opt-out provisions have failed to protect creators’ rights, forcing the EU to introduce additional transparency requirements in the recent AI Act. Putting it bluntly, simply legalising unauthorised use of creative works cannot be the answer to mass-scale copyright infringement. This is precisely why our proposed measures are crucial: they will maintain the existing copyright framework whereby AI companies must seek licences, while providing meaningful transparency that enables copyright holders to track the use of their work and seek proper redress, rather than blindly repeating proven failures.
My Lords, I speak in support of my noble friend Lady Kidron’s amendments. I declare an interest as a visual artist, and of course visual creators, as my noble friend Lord Freyberg has very well described, are as much affected by this as musicians, journalists and novelists. I am particularly grateful to the Design and Artists Copyright Society and the Authors’ Licensing and Collecting Society for their briefings.
A particular sentence in the excellent briefing for this debate by the News Media Association, referred to by my noble friend Lady Kidron, caught my eye:
“There is no ‘balance’ to be struck between creators’ copyrights and GAI innovation: IP rights are central to GAI innovation”.
This is a crucial point. One might say that data does not grow on a magic data tree. All data originates from somewhere, and that will include data produced creatively. One might also say that such authorship should be seen to precede any interests in use and access. It certainly should not be something tagged on to the end, as an afterthought. I appreciate that the Government will be looking at these things separately, but concerns of copyright should really be part of any Bill where data access is being legislated for. As an example, we are going to be discussing the smart fund a bit later in an amendment proposed by the noble Lord, Lord Bassam, but I can attest to how tricky it was getting that amendment into a Bill that should inherently be accommodating these interests.