What Is the EU GPAI Code of Practice?
Artificial Intelligence has entered a new era where general-purpose AI models (GPAI)—like large language models, multimodal systems, and generative AI—serve as the foundation for countless applications. Recognizing both the opportunities and risks these models present, the European Union introduced the Code of Practice for General-Purpose AI Models (GPAI). Adherence of GPAI models to EU Copyright Regulations is a vital requirement of the Code of Practice.
The Code is a voluntary framework. It is not legally binding in itself but is designed to:
- Help providers demonstrate compliance with the EU AI Act, especially Articles 53 and 55, which impose obligations on transparency, copyright, and systemic-risk management.
- Offer a clear orientation tool for providers navigating complex legal requirements.
- Provide the AI Office with a reference point to assess compliance for those providers who rely on the Code.
Important clarification: Adhering to the Code does not equal automatic compliance with EU law. As the Transparency Chapter stresses, only compliance with European harmonised standards creates a presumption of conformity under Article 53(4) AI Act. Still, the Code gives providers a valuable roadmap and a practical way to operationalize copyright compliance. Following up on the first post on transparency requirements of GPAI models under the Code of Practice, this post outlines the requirements for complying with the Copyright Chapter of the Code of Practice.
Why the Copyright Chapter Matters for GPAI Providers
Adherence of GPAI models to EU Copyright Regulations is a vital requirement of the Code of Practice.Training GPAI models requires access to vast amounts of creative and informational content—books, articles, images, music, and more. Much of this content is protected under EU copyright and related rights.
The Copyright Chapter of the GPAI Code of Practice translates the legal obligation of Article 53(1)(c) AI Act into five concrete measures. These measures require providers to:
- Draft and maintain a copyright policy.
- Ensure lawful access when gathering training data.
- Respect rights reservations (e.g. “do not train” signals).
- Reduce the risk of infringing outputs.
- Provide rights-holders with contact and complaint mechanisms.
Together, these commitments aim to ensure that GPAI models respect fundamental rights, reduce legal risk for providers, and build trust with both rights-holders and users.
Measure 1.1 – Draw Up, Keep Up-to-Date, and Implement a Copyright Policy
What the Code Requires
Providers must create and maintain a single written copyright policy that applies to all GPAI models placed on the EU market. This policy should:
- Assign clear responsibilities for implementation and oversight.
- Be regularly updated as practices or laws evolve.
- (Encouraged) Be summarized in a public document to promote transparency.
Best Practice Blueprint
A robust copyright policy should cover:
- Scope and governance: Define which models and internal functions (e.g., legal, compliance, data engineering, research teams) the policy applies to.
- Lawful data acquisition: Clarify rules for licensing, lawful web use, and Text and Data Mining (TDM) exceptions (Directive (EU) 2019/790).
- Rights reservation compliance: Build technical and procedural systems to honor signals (see Measure 1.3).
- Safeguards against infringing outputs: Apply dataset filtering and deduplication to reduce the presence of copyrighted material and avoid models memorizing or reproducing repeated works.” (see Measure 1.4).
- Contact points and complaints: Ensure rights-holders know how to reach you (see Measure 1.5).
- Auditability: Keep version histories and audit trails to show good-faith compliance.
Measure 1.2 – Reproduce and Extract Only Lawfully Accessible Content
What the Code Requires
Adherence of GPAI models to EU Copyright Regulations is a vital requirement of the Code of Practice.When collecting training data, providers therefore must:
- Use crawlers that only access lawfully available content.
- Not bypass paywalls or technical protection measures (Article 6 (3) Directive 2001/29/EC).
- Exclude websites formally recognized by courts or public authorities in the EU and EEA as persistently infringing.
Implementation Checklist
- Maintain allow/deny lists of domains, synced with official EU lists.
- Log crawler activity (timestamps, status codes, access headers).
- Ensure contractors or third-party crawlers comply with these rules.
- Automate checks to prevent unintentional scraping of restricted content.
This measure is about responsible sourcing: ensuring training pipelines are free from obvious copyright violations.
Measure 1.3 – Identify and Comply with Rights Reservations
What the Code Requires
Since adherence of GPAI models to EU Copyright Regulations is a vital requirement of the Code of Practice, providers of such models should use crawlers that respect machine-readable signals reserving rights. These include:
- Robots.txt files (see IETF RFC 9309).
- Metadata or headers that communicate “do not use for text and data mining.”
- Rights reservations according to other state-of-the-art standards, such as the W3C TDM Rep Protocol.
Providers should also:
- Publish crawler user-agent details.
- Provide a mechanism for rights-holders to receive updates (e.g. via a syndication feed).
- Where applicable, ensure separation between search indexing and training data crawling.
Why This Matters
- Legal basis: Article 4(3) of the DSM Directive allows rights-holders to reserve their works “in an appropriate manner,” including machine-readable form.
- Practical importance: These signals ensure creators retain agency over their works in the age of AI.
- Transparency: Publishing crawler documentation helps avoid disputes and builds trust.
Implementation Notes
- Use distinct user-agent identifiers for training-related crawlers.
- Enforce compliance at two stages: At collection (don’t fetch blocked content) and at processing (exclude any mistakenly collected content).
- Maintain versioned records of honored rights signals and run regression tests against real-world robots.txt files before deploying updates.
Measure 1.4 – Mitigating Infringing Outputs
What the Code Requires
Adherence of GPAI models to EU Copyright Regulations is a vital requirement of the Code of Practice. But even with lawful data sourcing, there’s a risk that a model might reproduce copyrighted works verbatim. To avoid this providers must:
- Adopt technical safeguards to prevent such outputs.
- Explicitly prohibit infringing uses in terms of service and documentation.
Implementation Suggestions
- Use memorization detection tools to identify when outputs replicate training data.
- Apply dataset deduplication and filtering during training.
- Introduce contractual clauses making downstream users responsible for compliance.
- Provide usage guidelines to customers, highlighting acceptable and prohibited uses.
Measure 1.5 – Contact Point and Complaints Mechanism
What the Code Requires
Providers must:
- Designate a point of contact for rights-holders.
- Publish this information clearly.
- Set up an electronic complaints mechanism (e.g., web form or email).
- Keep logs of complaints and responses.
Implementation Suggestions
- Use an internal ticketing system to manage and escalate complaints.
- Establish SLAs (service-level agreements) for response times.
- Regularly publish effectively anonymized transparency reports on complaint handling.
Practical Benefits for Providers
For Signatories
The Code of Practice for GPAI Models
- Establishes a structured compliance framework aligned with EU law.
- Builds trust with regulators and rights-holders.
- Reduces legal exposure and reputational risk.
For Non-Signatories
- Offers a blueprint for responsible practice even if you don’t adopt the Code formally.
- Prepares you for future regulatory scrutiny.
- Differentiates you as a responsible player in the AI market.
Real-World Examples
Some providers have begun adapting their practices in line with the Copyright Chapter of the EU Code of Practice for GPAI Models.
- OpenAI:
- Introduced an opt-out mechanism for publishers via robots.txt and related signals.
- Implemented licensing deals with publishers (e.g., Axel Springer, Associated Press) to access high-quality data lawfully.
- Google DeepMind:
- Publicly documents its crawlers and encourages rights-holders to manage access via robots.txt (Google Search Central).
These examples show that compliance is both feasible and already underway, even beyond formal Code signatories.
Challenges Providers May Face
- Technical complexity: Building and maintaining robust crawler compliance systems.
- Output detection: Preventing memorization is technically demanding.
- Legal uncertainty: National differences in copyright implementation may cause ambiguity.
Still, the trajectory is clear: compliance will become the standard baseline for all GPAI providers.
Conclusion & Call-to-Action
Copyright compliance is not a side issue — it is a pillar of “trustworthy” AI. The GPAI Code of Practice provides a clear, practical roadmap through Measures 1.1–1.5.
By implementing them, providers can:
- Reduce legal risks.
- Build trust with rights-holders and regulators.
- Strengthen their reputation in a competitive market.
Call-to-Action
- Providers: Audit your data pipelines, draft a copyright policy, and publish your contact details today.
- Downstream companies: Demand documentation from your model providers.
- Innovators: Treat copyright compliance not as a burden, but as a competitive advantage.
The EU AI Act is a complex regulatory framework. If you are developing AI models, make sure that you don´t neglect legal compliance.