Blog - 20/01/2025
Information Technology
The Law and AI: The Results are in! – ICO release outcomes of the consultation on generative AI and data protection
In early 2024 the Information Commissioner’s Office (the “ICO”) commenced a five-part generative AI consultation series, the aim of which was to address the regulatory uncertainties on how certain aspects of the UK GDPR and Data Protection Act 2018 apply to the use of generative AI.
After reviewing the consultation responses, the ICO updated its position on a number of areas and set out the future work that needs to be undertaken in the AI industry. The ICO officially released its report on the results of its 2024 generative AI consultation series on 12 December 2024 (the “Report”), which is available here.
Generative AI Consultation Series
Generative AI is a type of AI system that can generate new text, image, audio and video content. A good example of Generative AI is ChatGPT.
Generative AI models are usually trained on vast amounts of personal data, quite often without people being aware that their personal data is being processed. The generative AI consultation series was commenced in response to the issues generative AI presents for personal data, particularly as it relates to the lack of transparency and the risks posed to people’s rights and freedoms.
The generative AI consultation series addressed the following five key areas, which we will explore in further detail in this blog:
- The lawful basis for web scraping to train generative AI models;
- Purpose limitation in the generative AI lifecycle;
- Accuracy of training data and model outputs;
- Engineering individual rights into generative AI models;
- Allocating controllership across the generative AI supply chain.
Lawful basis for web scraping to train generative AI models:
The ICO’s view is that Legitimate Interests is the only available lawful basis which a developer/controller can rely on when using web-scraped personal data to train AI models. The Report provides an explanation as to why other legal bases are unlikely to be appropriate.
Legitimate Interests is the most flexible lawful basis for processing personal data; it allows you to use people’s data in ways they would reasonably expect (and which have a minimum privacy impact), or where there is a compelling justification for the processing.
To rely on Legitimate Interests, a three-part “balancing” test needs to be satisfied. A developer/controller would therefore need to carry out and satisfy the follow tests to be able to lawfully use web-scraped data:
- Purpose test: the developer must set out a specific and clear interest; they need to be able to prove that they are pursuing a legitimate interest;
- Necessity test: the developer must be able to explain why web-scraping is necessary and why alternative methods of data collection are not suitable, for example why personal data cannot be collected directly from people and licensed in a transparent way; and
- Balancing test: the developer needs to consider whether the individual’s interests, rights and freedoms override the developer’s legitimate interests.
The ICO noted that because web-scraping is an activity that often occurs without people being aware, this test would be difficult to satisfy in practice. Since people will not be aware that their data is being used, they would not be able to exercise their information rights, such as the right to object to the use of their data.
As a first step, the ICO has emphasised the need for generative AI developers to improve their approach to transparency when processing data to ensure individuals can effectively exercise their rights in relation to their personal data. In practice this would require developers to provide accessible and clear information as to what personal data they collect and how it is collected. Developers would also need to carry out and satisfy the Legitimate Interests balancing test which require them to consider (and document) whether web scraping is completely necessary and why other more transparent methods of collecting data are not appropriate.
Purpose limitation in the generative AI lifecycle
The ICO has affirmed its positions on how the purpose limitation principle should be interpreted in the generative AI lifecycle. Those positions are:
- That the purpose for training and deploying a generative AI has to be explicit and specific. This is so that people have a clear understanding of why and how their personal data is processed. Using personal data to develop a generative AI model is unlikely to fall within people’s reasonable expectations.
- Developers who reuse personal data to train a generative AI system must ensure that the purpose of the model is compatible with the original purpose for which the data was collected.
- Developing a generative AI model and developing an application based on such a model constitute different purposes.
In its Report the ICO acknowledged that respondents to the consultation had different views on what qualifies as explicit and specified purposes under Article 5(1)(b) for training generative AI. In response to this, the ICO is considering developing a guidance on how developers can demonstrate a sufficiently detailed and specific purpose when training generative AI.
Accuracy of training data and model outputs
The ICO’s Report confirmed the requirements on developers relating to the accuracy of training data as well as the accuracy of outputs of a generative AI model.
In summary, developers are required to understand and be transparent about the accuracy of the training data they use. Since inaccurate training data will lead to inaccurate model outputs, AI developers will need to ensure that the data used to train generative AI is sufficiently accurate for its processing purposes.
The ICO noted that the appropriate level of statistical accuracy required for training data used will be determined by the specific purpose which the model will be used for. For example, Generative AI models whose outputs users rely on a source of factual information will be much higher than AI models used to create non-factual outputs as source of inspiration.
Developers are also required to assess the risk of unexpected and incorrect outputs of the generative AI model and take measures to communicate the level of statistical accuracy to users, for example by labelling the outputs as generated by AI and using confidence scores to provide information about the output’s reliability. On this point the ICO has said that clear communication between developers, deployers and end-users will be key to ensuring that the degree of statistical accuracy is proportional to the model’s final application.
Engineering individual rights into generative AI models
The ICO is concerned about the lack of transparency amongst generative AI developers and has stressed the importance of data protection by design and by default which is a legal requirement under the UK GDPR. Developers are required to build systems that effectively implement data protection principles and integrate necessary safeguards into the processing to safeguard people’s interests, rights and freedoms.
In practice developers will need to have clear and effective processes for enabling people to exercise their rights over their personal data. This will require developers to provide accessible and clear information to people about the use of their personal data and have measures in place to effectively respond to people’s information rights requests. It will not be enough for developers to claim they are unable to respond to a request because they cannot identify the individual. Instead, developers would be required to make reasonable efforts to identify the person by offering easy ways for the person to provide additional information to enable their personal data to be identified.
The ICO fully expects generative AI developers and deployers to substantially improve how they fulfil their transparency obligations towards people.
Allocating controllership across the generative AI supply chain
The ICO has stated that whether an organisation is a controller, joint controller or processor will be determined by how the organisation deals with processing in practice and not necessarily according to what is set out in a contract. For example, where the relationship between the parties involves shared objectives and influence from both parties for the processing, such as developers and third-party developers in the “closed-source” generative AI field, this is likely to give rise to a joint controller relationship rather than a processor-controller one.
The role of an organisation will therefore need to be assessed by the level of control and influence over the purposes and means of the processing taking place.
Summary
The ICO is becoming increasingly concerned that many organisations are developing and deploying generative AI models without the measures and processes in place to uphold people’s interests, rights and freedoms. In its Report it emphasises the need for increased transparency surrounding generative AI and data protection and sets out a number of measures and suggestions to help organisations comply with their data protection obligations.
Organisations who use or develop a generative AI system should closely review the measures set out by the ICO in its Report. It will also be important and keep an eye out for any further guidance or notices published by the ICO on generative AI particularly as the Data (Use and Access) Bill progresses through parliament which may have an impact on the positions set out by the ICO in its Report. For more information on the Data (Use and Access) Bill, please see our blog here.
If you have any questions or would like to discuss any of the topics in this article, please contact Nick Phillips or Selina Clifford or any member of our AI team.
If you aren’t receiving our legal updates directly to your mailbox, please sign up now
Please note that this blog is provided for general information only. It is not intended to amount to advice on which you should rely. You must obtain professional or specialist advice before taking, or refraining from, any action on the basis of the content of this blog.
Edwin Coe LLP is a Limited Liability Partnership, registered in England & Wales (No.OC326366). The Firm is authorised and regulated by the Solicitors Regulation Authority. A list of members of the LLP is available for inspection at our registered office address: 2 Stone Buildings, Lincoln’s Inn, London, WC2A 3TH. “Partner” denotes a member of the LLP or an employee or consultant with the equivalent standing.
Please also see a copy of our terms of use here in respect of our website which apply also to all of our blogs.