Data (Use and Access) Bill [HL] Debate
Full Debate: Read Full DebateViscount Colville of Culross
Main Page: Viscount Colville of Culross (Crossbench - Excepted Hereditary)Department Debates - View all Viscount Colville of Culross's debates with the Department for Business and Trade
(2 days, 20 hours ago)
Grand CommitteeI have tabled Amendments 59, 62, 63 and 65, and I thank the noble Lord, Lord Clement-Jones, my noble friend Lady Kidron and the noble Viscount, Lord Camrose, for adding their names to them. I am sure that the Committee will agree that these amendments have some pretty heavyweight support. I also support Amendment 64, in the name of the noble Lord, Lord Clement-Jones, which is an alternative to my Amendment 63. Amendments 68 and 69 in this group also warrant attention.
I very much support the Government’s aim in Clause 67 to ensure that valuable research does not get discarded due to a lack of clarity around its use or because of an overly narrow distinction between the original and new purposes of the use of the data. The Government’s position is that this clause clarifies the law by incorporating into the Bill recitals to the original GDPR. However, while the effect is to encourage scientific research and development, it has to be seen in the context of the fast-evolving world of developments in AI and the way that AI developers, given the need for huge amounts of data to train their large language models, are reusing data.
My concern is that the scraping of vast amounts of data by these AI companies is often positioned as scientific research and in some cases is even supported by the production of academic papers. I ask the Minister to understand my concerns and those of many in the data community and beyond. The fact is that the lines between scientific research, as set out in Clause 67, and AI product development are blurred. This might not be the concern of the original recitals, but I beg to suggest to the Minister that, in the new world of AI, there should be concern about the definition presented in the Bill.
Like other noble Lords, I very much hope to make this country a centre of AI development, but I do not want this to happen at the expense of data subjects’ privacy and data protection. It costs at least £1 billion—even more, sometimes—to develop a large language model and, although the cost will soon go down, there is a huge financial incentive to scrape data that pushes the boundaries of what is legitimate. In this climate, it is important that the Bill closes any loopholes that allow AI developers to claim the protections offered by Clause 67. My Amendments 59, 62, 63 and 65 go some way to ensuring that this will not happen.
The definition of scientific research in proposed new paragraph 2, in Clause 67(1)(b), is drawn broadly. My concern is that many commercial developments of digital products, particularly those involving AI, could still claim to be, in the words of the clause, “reasonably … described as scientific”. AI model development usually involves a mix of purposes—not just developing its capabilities but also commercialising as it develops services. The exemption allowed for “purposes of technological development” makes me concerned that this vague area creates a threat whereby AI developers will misuse the provisions of the Bill to reuse personal data for any AI developments, provided that one of their goals is technological advancement.
Amendments 59 and 62, by inserting the word “solely” into proposed new paragraphs 2 and 3 in Clause 67, would disaggregate reuse of data for scientific research purposes from other purposes, ensuring that the only goal of reuse is scientific research.
An example of the threat under the present definition is shown by Meta’s recently allowing the reuse of Instagram users’ data to train its new generation of Llama models. When the news got out, it created a huge backlash, with more than half a million people reposting a viral hoax image that claimed to deny Meta the right to reuse their data to train AI. This caused the ICO to say that it was pleased that Meta had paused its data processing in response to users’ concerns, adding:
“It is crucial that the public can trust that their privacy rights will be respected from the outset”.
However, Meta could well claim under this clause that it is creating technological advancement which would allow it to reuse any data collected by users under the legitimate interest grounds for training the model. The Bill as it stands would not require the company to conduct its research in accordance with any of the features of genuine scientific research. These amendments go some way to rectify that.
Amendment 63 increases the test for what is deemed to be scientific interest. At the moment, the public interest test is applied only to public health. I am pleased that NHS researchers will have to recognise this threshold, but why should all researchers doing scientific work not have to adhere to this threshold? Why should that test not be applied to all data reuse for scientific research? By deleting the public health exception, the public interest test would apply to all data reuse for scientific purposes.
The original intention of the RAS purpose of the GDPR supports public health for scientific interests. This is complemented by Amendment 65, which uses the tests for consent already laid out in Clause 68. The inclusion of ethical thresholds in the reuse of data should meet the highest levels of academic rigour and oversight envisaged in the original GDPR. It will demand not just ethical standards in research but for it to be supervised by an independent research ethics committee that meets UKRI guidance. These requirements will ensure that the high standards of ethics that we expect from scientific research will be applied in evaluating the exemption in Clause 67.
I do not want noble Lords to think that these amendments are thwarting the development of AI. There is plenty of AI research that is clearly scientific. Look at DeepMind AlphaFold, which uses AI to analyse the shape of proteins so that they can be incorporated in future drug treatment and will move pharmaceutical development. It is an AI model developed in accordance with the ethical standards expected from modern scientific research.
The Minister will argue that the definition has been taken straight from EU recitals. I therefore ask her to consider very seriously what has been said about this definition by the EU’s premier data body, the European Data Protection Supervisor, in its preliminary opinion on data protection and scientific research. In its executive summary, it states:
“The boundary between private sector research and traditional academic research is blurrier than ever, and it is ever harder to distinguish research with generalisable benefits for society from that which primarily serves private interests. Corporate secrecy, particularly in the tech sector, which controls the most valuable data for understanding the impact of digitisation and specific phenomena like the dissimilation of misinformation, is a major barrier to social science research … there have been few guidelines or comprehensive studies on the application of data protection rules to research”.
It suggests that the rules should be interpreted in such a way that permits reuse only for genuine scientific research.
For the purpose of this preliminary opinion by the EDPS, the special data protection regime for scientific research is understood to apply if each of three criteria are met: first, personal data is processed; secondly, relevant sectorial standards of methodology and ethics apply, including the notion of informed consent, accountability and oversight; and, thirdly, the research is carried out with the aim of growing society’s collective knowledge and well-being as opposed to serving primarily one or several private interests. I hope that noble Lords will recognise that these are features that the amendments before the Committee would incorporate into Clause 67.
In the circumstances, I hope that the Minister, who I know has thought deeply about these issues, will recognise that the EU’s institutions are worried about the definition of scientific research that has been incorporated into the Bill. If they are worried, I suggest that we should be worried. I hope that these amendments will allay those fears and ensure that true scientific research is encouraged by Clause 67 and that it is not abused by AI companies. I beg to move.
I thank the Minister very much, but is she not concerned by the preliminary opinion from the EDPS, particularly that traditional academic research is blurrier than ever and that it is even harder to distinguish research which generally benefits society from that which primarily serves private interest? People in the street would be worried about that and the Bill ought to be responding to that concern.
I have not seen that observation, but we will look at it. It goes back to my point that the provisions in this Bill are designed to be future facing as well as for the current day. The strength of those provisions will apply regardless of the technology, which may well include AI. Noble Lords may know that we will bring forward a separate piece of legislation on AI, when we will be able to debate this in more detail.
My Lords, this has been a very important debate about one of the most controversial areas of this Bill. My amendments are supported across the House and by respected civic institutions such as the Ada Lovelace Institute. I understand that the Minister thinks they will stifle scientific research, particularly by nascent AI companies, but the rights of the data subject must be borne in mind. As it stands, under Clause 67, millions of data subjects could find their information mined by AI companies, to be reused without consent.
The concerns about this definition being too broad were illustrated very well across the Committee. The noble Lord, Lord Clement-Jones, said that it was too broad and must recognise that AI development will be open to using data research for any AI purposes and talked about his amendment on protecting children’s data, which is very important and worthy of consideration. This was supported by my noble friend Lady Kidron, who pointed out that the definition of scientific research could cover everything and warned that Clause 67 is not just housekeeping. She quoted the EDPS and talked about its critical clarification not being included in the transfer of the scientific definition into the Bill. The noble Lord, Lord Holmes, asked what in the Bill has changed when you consider how much has changed in AI. I was very pleased to have the support of the noble Viscount, Lord Camrose, who warned against the abuse and misuse of data and the broad definition in this Bill, which could muddy the waters. He supported the public interest test, which would be fertile ground for helping define scientific data.
Surely this Bill should walk the line in encouraging the AI rollout to boost research and development in our science sector. I ask the Minister to meet me and other concerned noble Lords to tighten up Clauses 67 and 68. On that basis, I beg leave to withdraw my amendment.
My Lords, Amendments 66, 67 and 80 in this group are all tabled in my name. Amendment 66 requires scientific research carried out for commercial purposes to
“be subject to the approval of an independent ethics committee”.
Commercial research is, perhaps counterintuitively, generally subjected to fewer ethical safeguards than research carried out purely for scientific endeavour by educational institutions. Given the current broad definition of scientific research in the Bill—I am sorry to repeat this—which includes research for commercial purposes, and the lower bar for obtaining consent for data reuse should the research be considered scientific, I think it would be fair to require more substantial ethical safeguards on such activities.
We do not want to create a scenario where unscrupulous tech developers use the Bill to harvest significant quantities of personal data under the guise of scientific endeavour to develop their products, without having to obtain consent from data subjects or even without them knowing. An independent ethics committee would be an excellent way to monitor scientific research that would be part of commercial activities, without capping data access for scientific research, which aims more purely to expand the horizon of our knowledge and benefit society. Let us be clear: commercial research makes a huge and critically important contribution to scientific research, but it is also surely fair to subject it to the same safeguards and scrutiny required of non-commercial scientific research.
Amendment 67 would ensure that data controllers cannot gain consent for research purposes that cannot be defined at the time of data collection. As the Bill stands, consent will be considered obtained for the purposes of scientific research if, at the time consent is sought, it is not possible to identify fully the purposes for which the personal data is to be processed. I fully understand that there needs to be some scope to take advantage of research opportunities that are not always foreseeable at the start of studies, particularly multi-year longitudinal studies, but which emerge as such studies continue. I am concerned, however, that the current provisions are a little too broad. In other words: is consent not actually being given at the start of the process for, effectively, any future purpose?
Amendment 80 would prevent the data reuse test being automatically passed if the reuse is for scientific purposes. Again, I have tabled this amendment due to my concerns that research which is part of commercial activities could be artificially classed as scientific, and that other clauses in the Bill would therefore allow too broad a scope for data harvesting. I beg to move.
My Lords, it seems very strange indeed that Amendment 66 is in a different group from group 1, which we have already discussed. Of course, I support Amendment 66 from the noble Viscount, Lord Camrose, but in response to my suggestion for a similar ethical threshold, the Minister said she was concerned that scientific research would find this to be too bureaucratic a hurdle. She and many of us here sat through debates on the Online Safety Bill, now an Act. I was also on the Communications Committee when it looked at digital regulations and came forward with one of the original reports on this. The dynamic and impetus which drove us to worry about this was the lack of ethics within the tech companies and social media. Why on earth would we want to unleash some of the most powerful companies in the world on reusing people’s data for scientific purposes if we were not going to have an ethical threshold involved in such an Act? It is important that we consider that extremely seriously.
My Lords, I welcome the noble Viscount to the sceptics’ club because he has clearly had a damascene conversion. It may be that this goes too far. I am slightly concerned, like him, about the bureaucracy involved in this, which slightly gives the game away. It could be seen as a way of legitimising commercial research, whereas we want to make it absolutely certain that that research is for the public benefit, rather than imposing an ethical board on every single aspect of research which has any commercial content.
We keep coming back to this, but we seem to be degrouping all over the place. Even the Government Whips Office seems to have given up trying to give titles for each of the groups; they are just called “degrouped” nowadays, which I think is a sign of deep depression in that office. It does not tell us anything about what the different groups contain, for some reason. Anyway, it is good to see the noble Viscount, Lord Camrose, kicking the tyres on the definition of the research aspect.