Baroness O'Neill of Bengarve
Main Page: Baroness O'Neill of Bengarve (Crossbench - Life peer)Department Debates - View all Baroness O'Neill of Bengarve's debates with the Home Office
(13 years, 9 months ago)
Grand Committee Baroness O'Neill of Bengarve
        
    
    
    
    
    
        
        
        
            Baroness O'Neill of Bengarve 
        
    
        
    
        My Lords, I want to speak to Amendments 147A, 147B, 148A, 148C and 148D. I will also comment, but much more briefly, on the more comprehensive Amendment 151, in the names of the noble Baronesses, Lady Brinton, Lady Warwick and Lady Benjamin, which I support, and I will comment very briefly on one of the amendments in the name of the noble Lord, Lord Lucas. Before doing so, I would like very much to thank the Minister and the Bill team for their exemplary courtesy and helpfulness in explicating their thinking on Clause 100—not, I think, the simplest clause of the Bill. If we have not reached agreement, it is not for lack of effort on their part.
Secondly, I would like to make it entirely clear that I am in favour of making scientific data more open. Science needs openness for its own purposes; it needs to have open data so that it is possible for others to check and challenge, and openness allows data to be put to unanticipated uses. Therefore, I am in much sympathy with the overall purpose of this part of the Bill. Of course, it used to be feasible—and it was standard practice—to publish data within articles in scientific journals. That is no longer feasible because of the size and complexity of many scientific data sets, so openness now has to be sought in other ways.
However, I believe that the Bill is based on too confident a view of the effectiveness and adequacy of the system of exemptions established in the Freedom of Information Act 2000 and of their capacity to avoid undesirable and unintended effects—particularly in this area, which is essentially that of scientific databases. Clause 100 proposes a seemingly minor, but in fact very substantial, change in the application of the freedom of information requirements to the release of data sets by public authorities. I will not at this stage say anything further about the use of the term “public authority”, as I think that we all understand that this means a publicly funded authority, which may, however, be a research institution or university that also has charitable status.
On the surface, Clause 100 simply requires the release of data sets in reusable electronic form, but I believe that in practice its demands will create a number of risks and problems. Let me therefore begin with Amendment 147A. The present drafting of the clause is, I believe, ambiguous, in that it requires data to be released upon request if the data are, or form part of, a data set held by a public authority. Amendment 147A seeks to restrict that requirement to “completed” parts of a data set held by a public authority. While it is reasonable to require that completed parts of still incomplete data sets be disclosed if requested—for example, the data pertaining to a past year in a continuously updated series—there is no benefit to anybody in disclosing an incomplete part of a data set. Indeed, requiring disclosure of incomplete parts of data sets could be misleading as well as damaging to research projects and to those provided with the incomplete, and perhaps misleading, data.
The clause would currently require disclosure of data sets while data were still being entered and had not yet been checked. At that stage, the incomplete part of the data set might be misleading. To take the example of a multi-centre clinical trial, requests for disclosure of incomplete parts of the data set could lead to the release of data that related only to a distinctive subset of patients whose data happened to become available at an earlier stage than those of other subsets of patients whose results might differ—that is, after all, the reason why the structure of clinical trials is quite elaborate. Such misleading releases might, I fear, falsely raise or dash the hopes of patients suffering from a serious condition, who would read the incomplete data set released as indicating that they had grounds for hope or despair.
I think that this issue arises because the  drafting actually conflates two very different types  of incompleteness in data sets. A data set may be incomplete because it relates to an ongoing project. In this case, completed parts of that data set relating, for example, to completed periods or phases in the project may indeed be available and could be released upon request.
In the second case, a data set or parts of a data set may be incomplete because the data are not yet fully available for entry, have not yet been entered or have not yet been checked. It could be highly misleading to require disclosure in the second case. Amendment 147A seeks to limit such requirements to disclose to the completed parts of data sets, where the danger of misleading is less.
Secondly, Amendment 147B requires that access is provided on request to data sets in reusable electronic form. Again, I stress that this is in principle an admirable thought. Where a data set is, for example, a relatively simple spreadsheet, this requirement would create no more difficulty for research databases than it does for government data sets. However, some scientific data sets are of orders of magnitude larger and do not use standard software; even if it is feasible, it may be extremely costly to render them usable by others or, indeed, reusable even by others with technical skills. We have to remember that those of whom data are requested will not know the skills of those who request  them. In such cases it may be necessary to provide metadata or to process data further in order to make access to them more feasible even for competent others. It is more usual to make research data available by archiving data sets or by setting out a publication or so called data sharing scheme that will provide access for others and also secure the crucial benefits of professional data curation and data security.
Amendment 148B will permit holders of research data to undertake to provide those data using these normal and reliable routes. At present, the Freedom of Information Act grants an exemption once data sets have already been placed in the public domain in this way, such as in a data archive or through a data sharing scheme. This amendment seeks to postpone access where such archiving is not merely foreseen but is something that data holders have undertaken to provide. In effect, it would create a temporary exemption for the data concerned. The Minister might see this as an opening for procrastination. However, if he is sympathetic to the realities of the problem, he might perhaps wish to consider at least a version of the amendment that offers a limited time for this exemption—for example, six months after the completion of the relevant research project or phase of the research project. It is a question of trading off quality for instant gratification, I suppose.
Amendment 148A concerns the charging of fees.  It seeks to address the real financial implications of seeking to make large and complex data sets available for reuse. The Bill provides for the charging of fees but does not allow public authorities to take account of the real costs of making data available to others. These costs may include not only additional checking and making metadata available but above all—and this is the main concern in the scientific community—the diversion of highly skilled and specialised time from research projects to the satisfaction of freedom of information requests. I have drafted the amendment to make it clear that it is the real costs of disclosure that matter. As noble Lords will have noted from the very helpful briefing provided for this section of the Bill by Universities UK, these costs can be very significant. It would not be reasonable, in my view, to require research projects or universities to bear these costs, which they cannot in principle have known about  when seeking and obtaining the funding to do the research.
The last two amendments to which I shall speak very briefly are Amendments 148C and 148E, which are relatively uncontroversial. At present, the Bill restricts the operations that may be performed on data sets prior to required disclosure to calculation. That is just unrealistic. Those who compile data sets also need  to check the data, which will be done using a variety  of methods, and take steps to ensure data integrity  and security, particularly at the point at which data  are to be disclosed on request. Amendment 148C provides for this; Amendment 148E is consequential on Amendment 148C.
On Amendment 148, tabled by the noble Lord, Lord Lucas, from what I have already said and what the UUK briefing—now supported by the Academy of Medical Sciences, the Wellcome Trust and other   scientific and medical bodies—has documented, the complexity of scientific databases rules out a solution along these lines. It would be very nice if it were feasible, but I believe that it is not feasible.
Amendment 151, tabled by the noble Baronesses, Lady Brinton, Lady Benjamin and Lady Warwick, is a substantial amendment. It takes the more radical step of seeking to define an additional exemption to freedom of information requirements and in the process achieves a number of the specific objectives that I have tried to achieve by more economical means in the amendments that I have tabled. However, their approach has one great advantage, which I believe—although I have racked my brains on this one—cannot be achieved by the more modest approach that I have taken. It recognises the risks to UK science and business and to the personal safety of researchers in certain fields—for example, involving work with animals—and to research subjects that will be created by Clause 100 if it is not amended. We are simply being naive if we imagine that we can rely on all those who request data respecting the intellectual property of those whose efforts produce data sets. We no longer live in a world where that is true, and we can all imagine many scenarios in which data disclosure is sought on behalf of others who work in jurisdictions where intellectual property is widely disrespected, with the aim of getting a free ride on the basis of work done by others without the payment of any fees. In those jurisdictions, legal remedies are not effective. I look forward to hearing a great deal more about Amendment 151. I beg to move.
 Lord Lucas
        
    
    
    
    
    
        
        
        
            Lord Lucas 
        
    
        
    
        My Lords, I have a clutch of amendments in this group. I will not at this moment comment on those proposed by the noble Baroness, Lady O’Neill, although I am looking forward to listening to others’ contributions on that subject. But it is very important that when a group of scientists ask us as a Government or community to take action based on results that they have published, the data underlying those results must be open to scrutiny. I understand that that has a difficult interaction with the questions raised by the noble Baroness, but I look forward to others’ contribution on how to solve that.
The first amendment that I have in the group is Amendment 148. I should declare that I am an extensive user of freedom of information legislation, particularly as regards universities, which I have found unutterably tiresome and difficult to deal with. One of their more tiresome habits is to refuse to provide information in anything other than PDF format. They get it in Excel, or whatever form, and translate it into PDF to provide it to me, merely to cause me extra work. I have to buy a program to suck it out of the PDF again. PDF is not a transmissible format, as it were, and they are merely trying to make life difficult by putting it in that format. So I would like to be sure that when data are provided they are provided in a properly reusable format. I have never come across a data set that cannot be reduced to tabbed, delimited text. Maybe that happens in a collection of tables, but data are essentially a simple thing. Although the data may be held in an immensely complex form in the program that the scientists are using, in any program that I have come across it   should be easy—if only for the purposes of sharing with other people—to drop out at least the base data into relatively simple form.
 Baroness O'Neill of Bengarve
        
    
    
    
    
    
        
        
        
            Baroness O'Neill of Bengarve 
        
    
        
    
        My Lords, I too am very grateful for the offer of a further meeting. I am slightly puzzled because I thought I had gone a considerable way to meet the very specific objections the Minister made to my previous drafts of these amendments in his letter and which also members of the Bill team have made. They are very narrow amendments and have a considerable protective implication because I have not suggested that it is incomplete databases but incomplete parts of databases that should not be released. If one thinks through the difference between the two one sees that whereas it might be open to a public authority to go on saying, “Oh our database is incomplete, we are perfecting it, we are polishing it, we are taking it into the next time period,” it could not say the same of each part of a database. So I believe that that move achieves the purposes of open data while not undermining them by licensing the disclosure of data that then have to be pulled back with the comment, “Well, it was only 10 per cent of the data points you got because that is what we had when your request was granted,”. It is a substantial amendment. Nevertheless I beg leave to withdraw Amendment 147A.