Is it legal to use personal data for training AI models?

In my previous post, I provided some insights on the legal guardrails governing the use of copyrighted works for training AI models and the ongoing discussion of this issue in the United States and the European Union. This follow-up article addresses another question of no lesser importance: Is it legal to use personal data for training AI models?

A court ruling anyone in AI should know

As shown in my previous post training AI models still requires large volumes of data. As a rule, such data used to be collected by ‘scraping’ websites and gathering suitable data from various internet sources using other techniques, regardless of potential copyright infringements. While it is to be expected that awareness of this issue will increase now that the relevant provisions of the EU AI Act on general-purpose models have come into effect, another related question has received less attention: Is it legal to use personal data to train AI models? However, there is good news, as the Higher Regional Court of Cologne in Germany recently ruled in favour of Meta on this issue. This article provides an overview of the facts underlying the case, the court’s decision, and a curated summary of the main reasons for the judgment. Aspects that may have been decisive for the court’s decision are highlighted in bold.

The Facts and what the court decided

A consumer protection agency applied for a preliminary injunction against Meta, seeking to prohibit the company from processing personal data published by consumers on Facebook and Instagram for the purpose of developing and improving its AI systems. However, the court ruled that Meta’s intended processing of such data was lawful, as it was deemed necessary for the legitimate interests pursued by Meta, and the interests of the data subjects were not considered to override these legitimate interests (Article 6, paragraph 1, lit. f) of the EU General Data Protection Regulation).

The reasons for Meta´s victory

The court provided a comprehensive analysis of all the arguments raised by the parties involved, as well as a detailed explanation of the reasons behind the decision. As the judgment shows, Opinion 28/2024 of the European Data Protection Board played a vital role in this case. In this opinion, the European Data Protection Board recognised that Article 6(1)(f) of the GDPR could serve as a legal basis for training AI models with personal data, provided that three conditions are met:

there must be a legitimate interest pursued by the controller or a third party to process the personal data for the envisaged purpose;

the processing must be necessary for the purposes of the legitimate interest(s) pursued (also referred to as “necessity test”); and

the legitimate interest(s) is (are) must not be ot overridden by the interests or fundamental rights and freedoms of the data subjects (also referred to as “balancing test”).

Applying this three step test, the court arrived at the conclusion that the following aspects spoke in favor of the legality of the processing of personal data for training Meta´s AI models:

Meta had a legitimate interest to train its AI models with Facebook and Instagram user data (step 1)

Passing the first of the three-step test did not prove to be a significant hurdle for Meta, as the Hamburg Commissioner for Data Protection and Freedom of Information, who attended the oral hearing, had adopted the prevailing legal opinion that, as a rule, a legitimate interest in processing personal data for training AI models can generally be assumed. Meta claimed that it intended to use generative AI to provide a chatbot to assist users with various tasks, such as holiday planning or creating text, images, and audio files. To this end, Meta intended to adapt its AI to regional customs. The court found that Meta had thus clearly and precisely formulated its interest, in processing the personal data and that it had demonstrated the reality and current relevance of this interest.

Moreover, Meta provided comprehensive evidence that it had complied with all its other obligations under the GDPR, including the principles of data accuracy, purpose limitation and transparency enshrined in Articles 5 and 12 GDPR. The court was also convinced that training Meta’s AI with user data would achieve the goal of offering optimized generative AI according to regional customs.

The processing of personal data was necessary to achieve its purpose (step 2)

Interestingly, the court also accepted Meta’s claim that it had considered alternative ways of achieving the purpose of the data processing, but that there were no sensible alternatives that could permit a less intensive use of personal data and still achieve its objectives. The court acknowledged that the intended processing of Facebook and Instagram user data was necessary to achieve the intended purpose, also because:

Recital 105 of the EU AI Act states that the development and training of generative AI models require access to vast amounts of text, images, videos and other data;

Meta´s claim that it had no practical way to anonymize the personal data of Facebook and Instagram users had already been confirmed by legal literature which considered an anonymization of personal data not practical;

Meta had submitted an affidavit of its Director of GenAI products who confirmed that Meta would obtain an inferior product if it would be forced to use a lesser amount of personal data and that using synthetic data wasn´t equivalent to using real user data from Facebook and Instagram;

Meta had no legal obligation to prove that it was neither necessary nor practical to process each and every data point in order to achieve the purposes of processing because the court found that training AI requires the use of bulks of data to generate patterns and probability parameters so that a single data point does not have any measurable impact on this process;

Meta had shown that it had implemented measures to “de-identify” all personal data in training data sets

obtaining current data through web scraping and web crawling would have resulted in significantly greater interference with the rights of data subjects as the measures taken by Meta to mitigate the effects of data processing would not apply to these data collection methods.

There were no overriding interests of Facebook and Instagram users (step 3)

The final step in leveraging Article 6(1) lit. 1 lit. f) of the GDPR as a legal basis for processing personal data in training AI models involves balancing the interests of the processor with those of the data subjects. In this case, the court referred to the ruling of the European Court of Justice (ECJ) in another Meta case (C-252/21), in which the ECJ stated that the reasonable expectations of the data subject, as well as the scale of the processing in question and its impact on that individual, must be considered when balancing interests. In the case discussed in this article, Meta passed this test with flying colours for the following reasons:

Meta processed personal data that had already been published on Facebook or Instagram by the data subjects, meaning there was no risk of new social or professional disadvantages arising for Facebook and Instagram users from the disclosure of such data.

Meta had tokenized the training data in an unstructured format after removing personally identifiable information, such as data subjects’ names, email addresses, phone numbers, national ID numbers, user IDs, credit and debit card numbers, bank account numbers, BICs, car licence plate numbers and IP and email addresses.

Meta demonstrated that it had taken suitable technical, physical and organizational measures to prevent unauthorized access to the training data.

Data subjects had ample opportunity to prevent their personal data being used to train Meta’s AI models, for example by revoking the public status of their Facebook and Instagram data. They could also object to this use of their data within six weeks of being transparently informed about Meta’s intended processing.

prohibiting the use of personal data for training purposes would affect the European law maker´s intention of creating “a uniform legal framework for AI” and “being a global leader in the development of secure, trustworthy and ethical AI” enshrined in recitals 8 and 1 of the EU AI Act.

Due to the size of the training data sets, it was unlikely that individual data subjects would be identifiable in the course of processing their data.

Meta had stated that it would not use the same personal identifiable information for future trainings of its AI models when a data subject has requested its deletion.

it was very likely that personal identifiable information would “vanish” in the vast amounts of training data and not appear in the output of Meta´s AI systems.

Subject to further consideration by the European Court of Justice, the court assumed that even the processing of special categories of data, as defined in Article 9 of the GDPR, such as the health data of third parties not using Meta’s platforms, was permissible, provided that these third parties had not requested the removal of their personal data from a post published by a registered Facebook or Instagram user.

Training AI models does not involve the targeted processing of personal data or the identification of individuals, as is the case with high-impact profiling of data subjects according to Article 4 of the GDPR.

So, Is it legal to use personal data for training AI models? If you check all factors addressed by the Higher Regional Court of Cologne and pass the 3-step test for the application of Article 6 para. 1 lit. f) GDPR, the answer is yes. As a caveat, the judgment of this court should be taken with a grain of salt as it was rendered in a preliminary injunction proceeding and not in a main proceedings that may still have a different outcome. Also the permissibility of such processing under the GDPR ultimately needs to be decided by the European Court of Justice who will have the final say about it.

This blog may revisit the permissibility of training AI models with personal data when new court rulings or other noteworthy material becomes available. If you are in the business of developing AI models, make sure that you don´t neglect legal compliance.

About the author

With more than 25 years of experience, Andreas Leupold is a lawyer trusted by German, European, US and UK clients.

He specializes in intellectual property (IP) and IT law and the law of armed conflict (LOAC). Andreas advises clients in the industrial and defense sectors on how to address the unique legal challenges posed by artificial intelligence and emerging technologies.

A recognized thought leader, he has edited and co-authored several handbooks on IT law and the legal dimensions of 3D printing/Additive Manufacturing, which he also examined in a landmark study for NATO/NSPA.

Connect with Andreas on LinkedIn