1. Introduction
On 11 November 2025, the Munich Regional Court issued a landmark ruling in GEMA v OpenAI (LG München I, Az. 42 O 14139/24), holding that OpenAI’s ChatGPT had infringed German copyright law by reproducing song lyrics protected by the collecting society GEMA. The decision is significant for a reason that goes beyond its immediate outcome: it rests on a specific legal theory of memorisation, under which a generative AI model can itself constitute an act of copyright reproduction if it is capable of regenerating protected content upon request.
This article analyses that reasoning and places it in the broader context of the ongoing debate on memorisation in generative AI law. It then examines a dimension that legal commentary has so far largely overlooked: the rapid development of technical methods aimed at preventing or eliminating memorisation in AI models. If these methods succeed, the factual premise on which the Munich court’s reasoning rests may soon no longer hold — with significant consequences for how EU copyright law applies to AI training.
2. The Legal Debate on Memorisation
The question of whether an AI model “memorises” its training data is deceptively simple. At its core, it asks whether the result of training — the model — constitutes a form of reproduction under copyright law, even if the original works are not stored in the conventional sense of a file saved to a hard drive.
One influential position, developed by Cooper and Grimmelmann (2024), holds that memorisation is a property of the model itself: it occurs when a near-exact copy of a training work can be reconstructed by querying the model, regardless of whether the user intended to extract it. On this view, the fact that the model can regenerate protected content is evidence that something legally equivalent to a copy is already present in the model’s parameters. This shifts the copyright question from the output stage (what the chatbot says) to the training stage (what the model contains).
A contrasting view, advanced by Larroyed (2025), challenges the use of memorisation as a legal concept altogether. In her analysis, the term has migrated too uncritically from computer science into copyright law, where it is often treated as synonymous with storage or retention. What exists inside a model, she argues, is not a stable and identifiable copy but the result of statistical processes — optimisation, correlation, parameter adjustment. To say that the model “contains” a work is, in this view, to rely on a misleading metaphor: the file inside the machine. The legal danger lies in allowing that intuitive image to determine the outcome of copyright analysis.
The disagreement between these two approaches is not merely technical. It reflects a deeper methodological choice about what should count for copyright purposes: model behaviour (can it regenerate the work?) or internal architecture (does it store the work?). Both perspectives have merit.
3. The Munich Court’s Reasoning
In GEMA v OpenAI, the Munich court adopted a broad understanding of reproduction. Drawing on Court of Justice of the EU case law, it held that reproduction does not require direct perceptibility: indirect perceptibility suffices. Applied to AI training, the court reasoned that even if outputs are generated probabilistically, the mere possibility of reproducing a protected work from the model is sufficient to constitute a reproduction within the meaning of German copyright law, which transposes Article 2 of the EU InfoSoc Directive (paras. 188–189).
The court’s approach closely follows the Cooper/Grimmelmann line: the decisive criterion is reproducibility — the model’s demonstrable capacity to regenerate the protected work. In GEMA, this was established by showing that ChatGPT could reproduce song lyrics near- verbatim without internet access, proving they were embedded in the model’s parameters rather than retrieved in real time.
The court also rejected OpenAI’s text and data mining (TDM) defence. Under Articles 3 and 4 of the EU Copyright in the Digital Single Market (CDSM) Directive, reproductions made in the course of TDM activities may be permitted. However, the court held that the memorisation of the lyrics — their reproducible embodiment in the model (para. 186) — went beyond what these exceptions allow, since the resulting reproduction was neither temporary nor limited to what was strictly necessary for the mining process.
The decision is a first-instance judgment and is currently under appeal. Its authority is therefore limited. Nevertheless, it represents the most direct judicial engagement to date with the question of whether AI model parameters can constitute an act of copyright reproduction under EU law.
4. A Different View: Getty Images v Stability AI
A contrasting judicial approach emerged from the English High Court in Getty Images v Stability AI [2025] EWHC 38 (Ch). Addressing whether an AI model trained on infringing copies could itself constitute an infringing copy under UK copyright law, Judge Joanna Smith answered in the negative. The model weights, the court held, were not an infringing copy but rather the product of statistical patterns learned during training. The mere fact that the training process involved reproductions of copyright works did not make the resulting model a copy of those works (paras. 599–600).
Care is needed before treating these two decisions as straightforward opposites. The evidentiary situations differed: in GEMA, the claimant demonstrated actual reproducibility of specific works; in Getty, the focus was on the legal characterisation of the model as an object, not on its capacity to regenerate particular content. Moreover, the English court was not bound by CJEU case law and applied a distinct statutory concept. The divergence nonetheless illustrates the persistence of two fundamentally different legal instincts about what memorisation means for copyright law.
5. The Technical Counterpoint: Mitigating Memorisation
A further consideration, which bears directly on the legal assessment above, concerns recent technical developments aimed at reducing or eliminating memorisation altogether (EUIPO, 2025, p. 156). The Munich court’s reasoning is premised on a factual finding: that the model was capable of reproducing the lyrics. But what if that capability could be systematically removed?
Contemporary research in machine learning has produced three main categories of techniques for this purpose.
Dataset curation addresses the problem at its source, before training begins. The most common method is deduplication: removing repeated instances of the same content from the training dataset. Since memorisation is strongly correlated with how often a piece of content appears in training data, deduplication significantly reduces the statistical likelihood that the model will learn to reproduce it verbatim (Lee et al., 2022).
Training architecture modifications intervene during the training process itself. One technique involves randomly excluding subsets of tokens — the basic units into which text is divided before processing — from the statistical predictions the model is trained to make. If the model is never trained to predict certain sequences in full, it cannot reproduce them in full (Hans et al., 2024).
Machine unlearning applies to models that have already been trained. Rather than retraining from scratch, these techniques aim to selectively remove the influence of specific training content on the model’s parameters. A well-managed unlearning process produces a revised model that retains its general performance while no longer being capable of reproducing the targeted material (Sakarvadia et al., 2024).
6. Implications for Copyright Law
These technical developments have significant legal implications, particularly for the Munich court’s reasoning. If memorisation can be technically prevented or eliminated, the factual premise of the GEMA decision — that protected content is reproducibly embodied in the model — loses its general validity. A model designed and verified to be incapable of reproducing protected works would, on the court’s own logic, not constitute a reproduction of them.
This also has consequences for the TDM exception analysis. One of the structural objections to applying Articles 3 and 4 CDSM to AI training is that the resulting model may itself constitute a retention of reproductions beyond what the exception permits. If mitigating techniques can significantly reduce or eliminate that retention, this objection is correspondingly weakened. The exception becomes more plausibly applicable — not because the legal framework changes, but because the technical facts do.
Two caveats are in order. First, the effectiveness of current mitigating techniques is not absolute: they reduce the probability of verbatim reproduction but may not eliminate it entirely. Second, the legal question does not reduce entirely to a technical one. Even a model with a very low probability of regenerating protected content may still, in some cases, produce near-exact outputs — and the law may still regard those cases as relevant, regardless of whether they result from memorisation in the technical sense.
A broader concern is that copyright doctrine should not become hostage to the state of the technology at a particular moment. As models evolve and memorisation is increasingly mitigated, courts and legislators will need to develop legal standards capable of responding to a moving technical target.
7. Conclusion
The GEMA v OpenAI decision is an important contribution to the emerging European case law on AI and copyright. Its core insight — that reproducibility, not file storage, is the relevant criterion for assessing whether a model constitutes a reproduction — is analytically coherent and consistent with a broad reading of the EU reproduction right.
However, the decision rests on a factual premise that is not permanent. The rapid development of techniques to mitigate memorisation means that the technical reality on which the judgment was based may soon cease to be the norm. If models can be reliably designed or adjusted to be incapable of reproducing protected content, the legal concern identified in GEMA would lose its factual foundation.
The deeper challenge is that the law tends to crystallise around the technology it first encounters. As generative AI continues to evolve, the legal framework will need to be flexible enough to distinguish between models that memorise and models that do not — a distinction that, for now, remains both technically and legally unresolved.