Lord Freyberg extracts from Data (Use and Access) Bill [HL] (18th December 2024)

Data (Use and Access) Bill [HL]

Lord Freyberg Excerpts

Future Day:
Mon 13th Jan 2025
Grand Committee

Committee stage
Wednesday 18th December 2024

(1 year, 7 months ago)

Grand Committee

Read Full debate Data (Use and Access) Act 2025 View all Data (Use and Access) Act 2025 Debates Read Hansard Text Read Debate Ministerial Extracts Amendment Paper: HL Bill 40-IV Fourth marshalled list for Grand Committee - (17 Dec 2024)

We have a rich and impactful creative sector. The reach of our artists, the soft power of our storytellers in all formats, the inventiveness of our designers and the skill of our musicians are legendary. The Government’s industrial strategy rightly recognises the creative industries and the tech sector as two of the UK’s priority growth-driving industries. The Government talk about balancing two competing sides, but they are neither the same nor equal. One is a creator and one is a distributor, regurgitator or, perhaps more generously, secondary user. As in all supply lines, you need to pay for your raw material to make something new. The Government will not achieve growth by simply allowing one growth area to cannibalise the other. Since the vast majority of benefit from AI scraping accrues to the US, it seems short-sighted, possibly criminal, to put the UK’s uniquely successful and profitable creative industries at the mercy of the predatory gen AI companies. I beg to move.

Lord Freyberg (CB)

- Hansard - -

My Lords, I support Amendments 204, 205 and 206, to which I have attached my name. In doing so, I declare my interest as someone with a long-standing background in the visual arts and as an artist member of the Design and Artists Copyright Society.

These amendments, tabled and superbly moved by my noble friend and supported by the noble Lords, Lord Stevenson and Lord Clement-Jones, seek to address a deep crisis in the creative sector whereby millions upon millions of creative works have been used to train general-purpose or generative AI models without permission or pay. While access to data is a fundamental aspect of this Bill, which in many cases has positive and legitimate aims, the unauthorised scraping of copyright-protected artworks, news stories, books and so forth for the use of generative AI models has significant downstream impacts. It affects the creative sectors’ ability to grow economically, to maximise their valuable assets and to retain the authenticity that the public rely on.

AI companies have used artists’ works in the training, development and deployment of AI systems without consent, despite this being a requirement under UK copyright law. As has been said, the narrow exception to copyright for text and data mining for specific research purposes does not extend to AI models, which have indiscriminately scraped creative content such as images without permission, simply to build commercial products that allow users to generate their own versions of a Picasso or a David Hockney work.

This amendment would clarify the steps that operators of web crawlers and general-purpose AI models must take to comply with UK copyright law. It represents a significant step forward in resolving the legal challenges brought by rights holders against AI companies over their training practices. Despite high-profile cases arising in the USA and the UK over unauthorised uses of content by AI companies, the reality is that individual artists simply cannot access judicial redress, given the prohibitive cost of litigation.

DACS, which represents artists’ copyright, surveyed its members and found that they were not technophobic or against AI in principle but that their concerns lay with the legality and ethics of current AI operators. In fact, 84% of respondents would sign up for a licensing mechanism to be paid when their work is used by an AI with their consent. This amendment would clarify that remuneration is owed for AI companies’ use of artists’ works across the entire development life cycle, including during the pre-training and fine-tuning stages.

Licensing would additionally create the legal certainty needed for AI companies to develop their products in the UK, as the unlawful use of works creates a litigation risk which deters investment, especially from SMEs that cannot afford litigation. DACS has also been informed by its members that commissioning clients have requested artists not to use AI products in order to avoid liability issues around its input and output, demonstrating a lack of trust or uncertainty about using AI.

This amendment would additionally settle ongoing arguments around whether compliance with UK copyright law is required where AI training takes place in other jurisdictions. By affirming its applicability where AI products are marketed in the UK, the amendment would ensure that both UK-based artists and AI companies are not put at a competitive disadvantage due to international firms’ ability to conduct training in a different jurisdiction.

One of the barriers to licensing copyright is the lack of transparency over what works have been scraped by AI companies. The third amendment in this suite of proposals, Amendment 206, seeks to address this. It would require operators of web crawlers and general-purpose AI models to be transparent about the copyright works they have scraped.

Currently, artists and creators face significant challenges in protecting their intellectual property rights in the age of AI. While tools such as Spawning AI’s “Have I Been Trained?” attempt to help creators identify whether their work has been used in AI training datasets, these initiatives provide only surface-level information. Creators may learn that their work was included in training data, but they remain in the dark about crucial details—specifically, how their work was used and which companies used it. This deeper level of transparency is essential for artists to enforce their IP rights effectively. Unfortunately, the current documentation provided by AI companies, such as data cards and model cards, falls short of delivering this necessary transparency, leaving creators without the practical means to protect their work.

Amendment 206 addresses the well-known black box issue that currently plagues the AI market, by requiring the disclosure of information about the URLs accessed by internet scrapers, information that can be used to identify individual works, the timeframe of data collection and the type of data collected, among other things. The US Midjourney litigation is a prime example of why this is necessary for UK copyright enforcement. It was initiated only after a leak revealed the names of more than 16,000 non-consenting artists whose works were allegedly used to train the tool.

Creators, including artists, should not find themselves in a position where they must rely on leaks to defend their intellectual property rights. By requiring AI companies to regularly update their own records, detailing what works were used in the training process and providing this to rights holders on request, this amendment could also create a vital cultural shift towards accountability. This would represent an important step away from the “Move fast and break things” culture pervasive amongst the Silicon Valley-based AI companies at the forefront of AI development, and a step towards preserving the gold-standard British IP framework.

Lastly, I address Amendment 205, which requires operators of internet crawlers and general-purpose AI models to be transparent about the identity and purpose of their crawlers, and not penalise copyright holders who choose to deny scraping for AI by down ranking their content in, or removing their content from, a search engine. Operators of internet crawlers that scrape artistic works and other copyright-protected content can obscure their identity, making it difficult and time-consuming for individual artists and the entities that represent their copyright interests to identify these uses and seek redress for illegal scraping.

Inclusion in search-engine results is crucial for visual artists, who rely on the visibility these provide for their work to build their reputation and client base and generate sales. At present, web operators that choose to deny scraping by internet crawlers risk the downrating or even removal of their content from search engines, as the most commonly used tools cannot distinguish between do-not-train protocols added to a site. This amendment will ensure that artists who choose to deny scraping for AI training are not disadvantaged by current technical restrictions and lose out on the exposure generated by search engines.

Finally, I will say a few words about the Government’s consultation launched yesterday, because it exposes a deeply troubling approach to creators’ IP rights, as has already been said so eloquently by the noble Baroness. For months, we have been urged to trust the Government to find the right balance between creators’ rights and AI innovation, yet their concept of balance has now been revealed for what it truly is: an incredibly unfair trade-off that gives away the rights of hundreds of thousands of creators to AI firms in exchange for vague promises of transparency.

Their proposal is built on a fundamentally flawed premise—promoted by tech lobbyists—that there is a lack of clarity in existing copyright law. This is completely untrue: the use of copyrighted content by AI companies without a licence is theft on a mass scale, as has already been said, and there is no objective case for the new text and data-mining exception. What we find in this consultation is a cynical rebranding of the opt-out mechanism as a rights reservation system. While they are positioning this as beneficial for rights holders through potential licensing revenues, the reality is that this is not achievable, yet the Government intend to leave it to Ministers alone to determine what constitutes

“effective, accessible, and widely adopted”

protection measures.

This is deeply concerning, given that no truly feasible rights reservation system for AI has been implemented anywhere in the world. Rights holders have been unequivocal: opt-out mechanisms—whatever the name they are given—are fundamentally unworkable in practice. In today’s digital world, where content can be instantly shared by anyone, creators are left powerless to protect their work. This hits visual artists particularly hard, as they must make their work visible to earn a living.

The evidence from Europe serves as a stark warning: opt-out provisions have failed to protect creators’ rights, forcing the EU to introduce additional transparency requirements in the recent AI Act. Putting it bluntly, simply legalising unauthorised use of creative works cannot be the answer to mass-scale copyright infringement. This is precisely why our proposed measures are crucial: they will maintain the existing copyright framework whereby AI companies must seek licences, while providing meaningful transparency that enables copyright holders to track the use of their work and seek proper redress, rather than blindly repeating proven failures.

The Earl of Clancarty (CB)

- Hansard - - - Excerpts

My Lords, I speak in support of my noble friend Lady Kidron’s amendments. I declare an interest as a visual artist, and of course visual creators, as my noble friend Lord Freyberg has very well described, are as much affected by this as musicians, journalists and novelists. I am particularly grateful to the Design and Artists Copyright Society and the Authors’ Licensing and Collecting Society for their briefings.

A particular sentence in the excellent briefing for this debate by the News Media Association, referred to by my noble friend Lady Kidron, caught my eye:

“There is no ‘balance’ to be struck between creators’ copyrights and GAI innovation: IP rights are central to GAI innovation”.

This is a crucial point. One might say that data does not grow on a magic data tree. All data originates from somewhere, and that will include data produced creatively. One might also say that such authorship should be seen to precede any interests in use and access. It certainly should not be something tagged on to the end, as an afterthought. I appreciate that the Government will be looking at these things separately, but concerns of copyright should really be part of any Bill where data access is being legislated for. As an example, we are going to be discussing the smart fund a bit later in an amendment proposed by the noble Lord, Lord Bassam, but I can attest to how tricky it was getting that amendment into a Bill that should inherently be accommodating these interests.

--- Later in debate ---

Lord Tarassenko (CB)

- Hansard - - - Excerpts

My Lords, I support my noble friend Lady Kidron’s Amendment 211, to which I have put my name. I speak not as a technophobe but as a card-carrying technophile. I declare an interest as, for the past 15 years, I have been involved in the development of algorithms to analyse NHS data, mostly from acute NHS trusts. This is possible under current regulations, because all the research projects have received medical research ethics approval, and I hold an honorary contract with the local NHS trust.

This amendment is, in effect, designed to scale up existing provisions and make sure that they are applied to public sector data sources such as NHS data. By classifying such data as sovereign data assets, it would be possible to make it available not only to individual researchers but to industry—UK-based SMEs and pharmaceutical and big tech companies—under controlled conditions. One of these conditions, as indicated by proposed new subsection (6), is to require a business model where income is generated for the relevant UK government department from access fees paid by authorised licence holders. Each government department should ensure that the public sector data it transfers to the national data library is classified as a sovereign data asset, which can then be accessed securely through APIs acting

“as bridges between each sovereign data asset and the client software of the authorized licence holders”.

In the time available, I will consider the Department of Health and Social Care. The report of the Sudlow review, Uniting the UK’s Health Data: A Huge Opportunity for Society, published last month, sets out what could be achieved though linking multiple NHS data sources. The Academy of Medical Sciences has fully endorsed the report:

“The Sudlow recommendations can make the UK’s health data a truly national asset, improving both patient care and driving economic development”.

There is little difference, if any, between health data being “a truly national asset” and “a sovereign asset”.

Generative AI has the potential to extract clinical value from linked datasets in the various secure data environments within the NHS and to deliver a step change in patient care. It also has the potential to deliver economic value, as the application of AI models to these rich, multimodal datasets will lead to innovative software products being developed for early diagnosis and personalised treatment.

However, it seems that the rush to generate economic value is preceding the establishment of a transparent licensing system, as in proposed new subsection (3), and the setting up of a coherent business model, as in proposed new subsection (6). As my noble friend Lady Kidron pointed out, the provisions in this amendment are urgently needed, especially as the chief data and analytics officer at NHS England is reported as having said, at a recent event organised by the Health Service Journal and IBM, that the national federated data platform will soon be used to train different types of AI model. The two models mentioned in the speech were OpenAI’s proprietary ChatGPT model and Google’s medical AI, which is based on its proprietary large language model, Gemini. So, the patient data in the national federated data platform being built by Palantir, which is a US company, is, in effect, being made available to fine-tune large language models pretrained by OpenAI and Google—two big US tech companies.

As a recent editorial in the British Medical Journal argued:

“This risks leaving the NHS vulnerable to exploitation by private technology companies whose offers to ‘assist’ with infrastructure development could result in loss of control over valuable public assets”.

It is vital for the health of the UK public sector that there is no loss of control resulting from premature agreements with big tech companies. These US companies seek privileged access to highly valuable assets which consist of personal data collected from UK citizens. The Government must, as a high priority, determine the rules for access to these sovereign data assets along the lines outlined in this amendment. I urge the Minister to take on board both the aims and the practicalities of this amendment before any damaging loss of control.

Lord Freyberg (CB)

- Hansard - -

My Lords, I support Amendment 211 moved by my noble friend Lady Kidron, which builds on earlier contributions in this place made by the noble Lords, Lord Mitchell, Lord Stevenson, Lord Clement-Jones, and myself, as long ago as 2018, about the need to maximise the social, economic and environmental value that may be derived from personal data of national significance and, in particular, data controlled by our NHS.

The proposed definition of “sovereign data assets” is, in some sense, broad. However, the intent to recognise, protect and maximise their value in the public interest is readily inferred. The call for a transparent licensing regime to provide access to such assets and the mention of preferential access for individuals and organisations headquartered in the UK also make good sense, as the overarching aim is to build and maintain public trust in third-party data usage.

Crucially, I fully support provisions that would require the Secretary of State to report on the value and anticipated financial return from sovereign data assets. Identifying a public body that considered itself able or willing to guarantee value for money proved challenging when this topic was last explored. For too long, past Governments have dithered and delayed over the introduction of provisions that explicitly recognise the need to account for and safeguard the investment made by taxpayers in data held by public and arm’s-length institutions and associated data infrastructure—something that we do as a matter of course where the tangible assets that the National Audit Office monitors and reports on are concerned.

In recent weeks, the Chancellor of the Exchequer has emphasised the importance of recovering public funds “lost” during the Covid-19 pandemic. Yet this focus raises important questions about other potential revenue streams that were overlooked, particularly regarding NHS data assets. In 2019, Ernst & Young estimated that a curated NHS dataset could generate up to £5 billion annually for the UK while also delivering £4.6 billion in yearly patient benefits through improved data infrastructure. This begs the question: who is tracking whether these substantial economic and healthcare opportunities are being realised? Who is ensuring that these projected benefits—both financial and clinical—are actually flowing back into our healthcare system?

As we enter the age of AI, public discourse often fixates on potential risks while overlooking a crucial opportunity—namely, the rapidly increasing value of publicly controlled data and its potential to drive innovation and insights. This raises two crucial questions. First, how might we capitalise on the upside of this technological revolution to maximise the benefits on behalf of the public? Secondly, and more specifically, how will Parliament effectively scrutinise any eventual trade deal entered into with, for example, the United States of America, which might focus on a more limited digital chapter, in the absence of either an accepted valuation methodology or a transparent licensing system for use in providing access to valuable UK data assets?

Will the public, faced with a significant tax burden to improve public services and repeated reminders of the potential for data and technology to transform our NHS, trust the Government if they enable valuable digital assets to be stripped today only to be turned tomorrow into cutting-edge treatments that we can ill afford to purchase and that benefit companies paying taxes overseas? To my mind, there remains a very real risk that the UK, as my noble friend Lady Kidron, rightly stated, will inadvertently give away potentially valuable digital assets without there being appropriate safeguards in place. I therefore welcome the intent of Amendment 211 to put that right in the public interest.