Data (Use and Access) Bill [HL] Debate
Full Debate: Read Full DebateEarl of Effingham
Main Page: Earl of Effingham (Conservative - Excepted Hereditary)Department Debates - View all Earl of Effingham's debates with the Department for Business and Trade
(1 day, 16 hours ago)
Grand CommitteeMy Lords, I have very little to add because I entirely support all these amendments. I am always concerned when I see the words “lack of clarity” in a context like this. The basic principle of copyright law, whereby one provides a licence and is paid for that licence by agreement, has been well established. There is no need for any further clarity in this context, as in earlier contexts of copyright law.
I should declare an interest as the chairman of IPSO, the regulator of 95% of the printed news media and its online versions. I have been impressed by the News Media Association’s briefings. It has identified important issues. I am extremely concerned about what appears to have been a considerable amount of lobbying by big tech in this area. It reminds me of what took place when your Lordships’ House considered the Digital Markets, Competition and Consumers Bill. A low point for me was when we were told that it would be very difficult to establish a proper system otherwise Google’s human rights would be somehow infringed. It is extremely important that this so-called balance does not mean that those who create original material protected by the copyright Acts have their rights violated in order to satisfy the interests of big tech.
My Lords, my noble friend Lord Camrose apologises to the Committee but he has had to leave early for unavoidable family reasons. Needless to say, he will read Hansard carefully.
It is our belief that a society that fails to value products of the mind will never be an innovative society. We are fortunate to live in that innovative society now and we must fight to ensure it remains one. Data scraping and AI crawlers pose both novel and substantial challenges to copyright protection laws and mechanisms. His Majesty’s Official Opposition are pleased that these amendments have been brought forward to address those challenges, which differ from those posed by traditional search engine crawlers.
Generally speaking, in creating laws about data we have been able to follow a north star of replicating online the values and behaviours we take for granted offline. This was of real service to us in the Online Safety Act, for example. In many ways, however, that breaks down when we come to AI and copyright. Offline, we are happy to accept that an artist, author, musician or inventor has been influenced by existing works in their field. Indeed, we sometimes celebrate that fact, and we have a strong intuitive sense of when influence has crossed the line into copying. This means that we can form an intuitive assessment of whether a copyright has been breached offline based on what creators produce, not what content they have consumed, which we expect to be extensive. With an AI crawler, that intuition and model break down. There are simply too many variables and too much information. We have no choice but to go after the inputs.
With that in mind, it would be helpful to set out the differences between traditional search engine crawlers and AI crawlers. Indexing crawlers used by the search engines we are all familiar with store information in their indexes. This then determines the results of the search. However, AI crawlers generally fall into two categories. The training crawlers scrape the web, collecting data used to train large language models. Live retrieval crawlers pull in live data from the web and incorporate it into chatbot responses.
Historically, the robots exclusion protocol—the plain text file identified as robots.txt—has been embedded into website domains, specifying to crawlers what data they can and cannot access in part or all of the domain. This has been used for the past 30 years to protect information or IP from indexing crawlers. Although the robots exclusion protocol has worked relatively well for many years, in some ways it is not fit for the web as it exists today—especially when dealing with AI crawlers.
To exclude crawlers from websites, we must be able to identify them. This was, for the most part, workable in the early days of the internet when there were relatively few search engines and, correspondingly, few indexing crawlers. However, given the rapidly increasing number of AI services, with their corresponding crawlers trawling the web, it becomes impossible to exclude them all. To make matters worse, some AI crawlers operate in relative secrecy. Their names, which can be viewed through domain holder access logs, reveal little of their purpose.
Furthermore, the robots exclusion protocol is not an enforceable agreement; it is more like a polite request. Based on that, a crawler can simply ignore a robots.txt file and scrape the data anyway. It is also worth noting that even if a crawler acknowledges and obeys a robots.txt file, the data may be inadvertently scraped from a third-party source who has lifted the data of intellectual property either manually or using a crawler that does not obey the robots.txt files. That can then be made available without the protection of the robots exclusion protocol. This raises an unsettling question: how do we protect intellectual property and data more generally from these AI crawlers, whose developers decline the voluntary limitations placed on them?
At this point, I turn to the amendments. Amendment 204 is a great initial step toward requiring crawler operators to respect UK copyright law. However, this provision would apply only to products and services of such operators that are marketed in the United Kingdom. What about those from outside the UK? Indeed, as my noble friend Lord Camrose has often argued, any AI lab that does not want to follow our laws can infringe the same copyright with impunity in another jurisdiction. Unless and until we address the offshoring problem, we continue to have real concerns as to the enforceability of any regulations we implement here.
I will address the individual subsections in Amendment 205. Proposed new subsection (1) would require crawlers to reveal their identity, including their name, who is responsible for them, their purpose, who receives their scraped data, and a point of contact. This is an excellent idea, although we are again concerned about enforceability due to offshoring. Proposed new subsection (2) requires this information to be easily accessible. We are sure this would be beneficial, but our concerns remain about infringements in other jurisdictions.
Requiring the deployment of crawlers with distinct purposes in proposed new subsection (3) is an excellent idea as it would allow data controllers to choose what data can be trawled and for what purpose, to the extent possible using the robots exclusion protocol. We do, however, have concerns about proposed new subsection (4). We are not sure how it would be possible for the exclusion of an AI crawler not to impact the findability of content. We assume this could be achieved only if we mandated the continued use of indexing crawlers.
As for Amendment 206, requiring crawler operators to regularly disclose the information scraped from copyrighted sources and make it accessible to copyright holders on their request is an interesting suggestion. We would be curious to hear how this would work in practice, particularly given the vast scale—some of those models crawl billions of documents, generating trillions of tokens. Where would that data be published? Given the scale of data-scraping, how would copyright holders know where to look for this information? If the operator was based outside the UK, how would disclosure be enforced? Our view is that watermarking technology can come to the rescue, dependent of course on an internationally accepted technical standard for machine-readable watermarks that contain licensing information.
My Lords, it is a pity that this debate is taking place so late. I thank the noble Lord, Lord Arbuthnot, for his kind remarks, but my work ethic feels under considerable pressure at this time of night.
All I will say is that this is a much better amendment than the one that the noble Baroness, Lady Kidron, put forward for the Data Protection and Digital Information Bill, and I very strongly support it. Not only is this horrifying in the context of the past Horizon cases, but I read a report about the Capture software, which is likely to have created shortfalls that led to sub-postmasters being prosecuted as well. This is an ongoing issue. The Criminal Cases Review Commission is reviewing five Post Office convictions in which the Capture IT system could be a factor, so we cannot say that this is about just Horizon, as there are the many other cases that the noble Baroness cited.
We need to change this common law presumption even more in the face of a world in which AI use, with all its flaws and hallucinations, is becoming ever present, and we need to do it urgently.
My Lords, I thank the noble Baroness, Lady Kidron, for tabling her amendment. We understand its great intentions, which we believe are to prevent another scandal similar to that of Horizon and to protect innocent people from having to endure what thousands of postmasters have undergone and suffered.
However, while this amendment would make it easier to challenge evidence derived from, or produced by, a computer or computer system, we are concerned that, should it become law, this amendment could be misused by defendants to challenge good evidence. Our fear is that, in determining the reliability of such evidence, we may create a battle of the expert witnesses. This will not only substantially slow down trials but result in higher costs. Litigation is already expensive, and we would aim not to introduce additional costs to an already costly process unless absolutely necessary.
From our perspective, the underlying problem in the Horizon scandal was not that computer systems were critically wrong or that people were wrong, but that the two in combination drove the terrible outcomes that we have unfortunately seen. For many industries, regulations require firms to conduct formal systems validation, with serious repercussions and penalties should companies fail to do so. It seems to us that the disciplines of systems validation, if required for other industries, would be both a powerful protection and considerably less disruptive than potentially far-reaching changes to the law.
My Lords, I thank the noble Baroness and the noble Lord, Lord Arbuthnot, for Amendment 207 and for raising this important topic. The noble Baroness and other noble Lords are right that this issue goes far wider than Horizon. We could debate what went wrong with Horizon, but the issues before us today are much wider than that.
The Government are agreed that we must prevent future miscarriages of justice. We fully understand the intention behind the amendment and the significance of the issue. We are actively considering this matter and will announce next steps in the new year. I reassure noble Lords that we are on the case with this issue.
In the meantime, as this amendment brings into scope evidence presented in every type of court proceeding and would have a detrimental effect on the courts and prosecution—potentially leading to unnecessary delays and, more importantly, further distress to victims—I must ask the noble Baroness whether she is content to withdraw it at this stage. I ask that on the basis that this is an ongoing discussion that we are happy to have with her.
My Lords, I have far too little time to do justice to this subject. We on these Benches welcome this amendment. It is entirely consistent with the sovereign health fund proposed by Future Care Capital and, indeed, with the proposals from the Tony Blair Institute for Global Change on a similar concept called the national data trust. Indeed, this concept formed part of our Liberal Democrat manifesto at the last general election, so of course I support the amendment.
It would be very useful to hear more about the national data library, including on its purpose and operation, as the noble Baroness, Lady Kidron, said. I entirely agree with her that there is a great need for a sovereign cloud service or services. Indeed, the inability to guarantee that data on the cloud is held in this country is a real issue that has not yet been properly addressed.
My Lords, I thank the noble Baroness, Lady Kidron, for moving this amendment. As she rightly identified, the UK has a number of publicly held data assets, many of which contain extremely valuable information. This data—I flag, by way of an example, NHS data specifically—could be extremely valuable to certain organisations, such as pharmaceutical companies.
We are drawn to the idea of licensing such data—indeed, we believe that we could charge an extremely good price—but we have a number of concerns. Most notably, what additional safeguards would be required, given its sensitivity? What would be the limits and extent of the licensing agreement? Would this status close off other routes to monetising the data? Would other public sector bodies be able to use the data for free? Can this not already be done without the amendment?
Although His Majesty’s Official Opposition of course recognise the wish to ensure that the UK taxpayer gets a fair return on our information assets held by public bodies and arm’s-length organisations, and we certainly agree that we need to look at licensing, we are not yet sure that this amendment is either necessary or sufficient. We once again thank the noble Baroness, Lady Kidron, for moving it. We look forward to hearing both her and the Minister’s thoughts on the matter.
My Lords, I am grateful to the noble Baroness, Lady Kidron, for her amendment. I agree with her that the public sector has a wealth of data assets that could be used to help our society achieve our missions and contribute to economic growth.
As well as my previous comments on the national data library, the Government’s recent Green Paper, Invest 2035: The UK’s Modern Industrial Strategy, makes it clear that we consider data access part of the modern business environment, so improving data access is integral to the UK’s approach to growth. However, we also recognise the value of our data assets as part of this approach. At the same time, it is critical that we use our data assets in a trustworthy and ethical way, as the noble Baroness, Lady Kidron, and the noble Lord, Lord Tarassenko, said, so we must tackle these issues carefully.
This is an active area of policy development for the Government, and we need to get it right. I must therefore ask the noble Baroness to withdraw her amendment. However, she started and provoked a debate that will, I hope, carry on; we would be happy to engage in that debate going forward.
My Lords, I shall be #even shorter. Data centres and their energy consumption are important issues. I agree that at a suitable moment—probably not now—it would be very interesting to hear the Government’s views on that. Reports from UK parliamentary committees and the Government have consistently emphasised the critical importance of maintaining public trust in data use and AI, but sometimes, the actions of the Government seem to go contrary to that. I support the noble Lord, Lord Holmes, in his call for essentially realising the benefits of AI while making sure that we maintain public trust.
My Lords, I thank my noble friend Lord Holmes of Richmond for tabling this amendment. As we all appreciate, taking stock of the effects of legislation is critical, as it allows us to see what has worked and what has not. Amendment 221B would require the Secretary of State to launch a consultation into the implications of the provisions of the Bill on the power usage and energy efficiency of data centres. His Majesty’s Official Opposition have no objection to the amendment’s aims but we wonder to what extent it is actually possible. By what means or benchmark can we identify whether a spike in energy usage is specifically due to a provision from this legislation, rather than as a result of some other factor? I should be most grateful if my noble friend could provide further detail on this matter in his closing speech.
Regarding Amendment 211C, we understand that much could be learned from a review of all data regulations and standards pertaining to the supply chains for financial, trade, and legal documents and products, although we wonder if this needs to happen the moment this Bill passes. Could this review not happen at any stage? By all means, let us do it sooner rather than later, but is it necessary to set a date in statute?
Moving on to Amendment 221D, we should certainly look to regulate the AI large language model sector to ensure that there are standards for the input and output of data for LLMs. However, this must be done in a way that does not stifle growth in this emerging industry.
Finally, we have some concerns about Amendment 211E. A national consultation on the use of individuals’ data is perhaps just too broad.
My Lords, listening to the noble Lord, Lord Lucas, is often an education, and today is no exception. I had no idea what local environmental records centres were, so I shall be very interested to hear what the Minister has to say in response.
My Lords, I thank my noble friend Lord Lucas for tabling Amendment 211F and all noble Lords for their brief contributions to this group.
Amendment 211F ensures that all the biodiversity data collected by or in connection with government is collected in local environment records centres to ensure that records are as good as possible. That data is then used by or in connection with government, so it is put to the best possible use.
The importance of sufficient and high-quality record collection cannot and must not be understated. With this in mind, His Majesty’s Official Opposition support the sentiment of the amendment in my noble friend’s name. These Benches will always champion matters related to biodiversity and nature recovery. In fact, many of my noble friends have raised concerns about biodiversity in Committee debates in your Lordships’ House on the Crown Estate Bill, the Water (Special Measures) Bill and the Great British Energy Bill. Indeed, they have tabled amendments that ensure that matters related to biodiversity appear at the forefront of draft legislation.
With that in mind, I am grateful to my noble friend Lord Lucas for introducing provisions, via Amendment 211F, which would require any planning application involving biodiversity net gain to include a data search report from the relevant local environmental records centre. I trust that the Minister has listened to the concerns raised collaboratively in the debate on this brief group. We must recognise the importance of good data collection and ensure that such data is used in the best possible way.
My Lords, I thank the noble Lord, Lord Lucas, for his Amendment 211F. I absolutely agree that local environmental records centres provide an important service. I reassure noble Lords that the Government’s digital planning programme is developing data standards and tools to increase the availability, accessibility and usability of planning data. This will transform people’s experience of planning and housing, including through local environmental records centres. On that basis, I must ask the noble Lord whether he is prepared to withdraw his amendment.