Upstream problems in the realm of AI and Copyright

  1. Artificial Intelligence and the Art Market.

“Artificial Intelligence is developing fast. It will change our lives by improving healthcare, increasing the efficiency of farming, contributing to climate change mitigation and adaptation, improving the efficiency of production systems through predictive maintenance, increasing the security of Europeans, and in many other ways that we can only begin to imagine”. This is the beginning of the White Paper on Artificial Intelligence, adopted by the European Commission on 19 February 2020[1], which correctly points out the important role that AI is beginning to take in our lives,in the society and in the economy. In this context, we can say that many sectors will be affected, including the legal sector, where the adaptation of many basic and well-established principles is beginning to be at stake.  Among all the issues that AI is raising, in this publication we will comment the impact that it is having on the art market and on the regulation of copyright.

Human beings are no longer the only source of creativity,asAI systems are alreadycapable of creating works of art[2]. Examples include Arimine Ryuta, created by the Sato-Matsuzaki Laboratory of Nagoya University, thatwrote a mini novel chosen as a finalist in the Nikkei Hoshi Shinichi literary competition[3]; or BRUTUS, a program developed by Selmer Bringsjord and his collaborators, which was provided with rules of grammar and vocabulary, as well as a database of academic language and specific representations, from which it creates stories in the mystery genre, whose main theme is betrayal[4]. However, we could cite other multiple systems with a relevant role in this market, such as, Google’s Deep Dream[5], Magenta Music[6], Flow Machines[7], or the well-known New Rembrandt, which perfectly imitates the artist’s strokes and style[8].

These systems are trained through the analysis of large amounts of data, so that later, in an autonomous way, they can identify patterns, make predictions, and make specific decisions. As more data is entered, their results become more sophisticated and accurate. For example, the more images we upload to Google Photo, the more we’re helping Google to improve its image recognition system[9]. This process is called Machine Learning, and within we find Deep Learning, a subfield in which neural networks and feedback are becoming an authentic revolution. From the data provided, the systems evolve, under their own rules, in ways that may be unexpected even for their creators[10]. The most significant advance in this field has occurred through the generative adversarial networks (GAN), a set of algorithms that force two neural networks to compete with each other in order to learn and evolve on their own, learning to design any kind of images and sounds with such a high degree of realism that it could be thought that they have been produced by a natural person[11].

As far as Copyright is concerned, we could say that AI systems are becoming producers of art, with their own creative capacity. So, the first question wemay ask is whether an AI system can be considered as an Author, and whether the works of art created by them are protectable by copyright[12]. In my opinion, only the works created by human beings can achieve the originality standard required to enjoy copyright protection, since the CJEU has established a European concept of work in which the author´s own intellectual creation must be expressed. However, this is a highly controversial issue that will bring much debate in the coming years.

But AI systems are not only producers of art, but also consumers of it, and this leads us to the so-called “upstream” problems, which is what we are going to analyse. This is a concept that refers to the legal issues that arise when the AI system is trained through pre-existing works protected by copyright. While text and data mining (TDM) constitutes the basis of AI and ML, and is therefore fundamental to the training of systems, it is a practice that could infringe copyright or the sui generis right of databases. Then, the hot topic in this sense is whether we should prevent AI from learning from such works in the event that their developers do not have a prior license or assignment of rights, as well as about the possibility of extending the current limits to the AI learning process, or establishing a new one in order to allow the technology industry to develop its full potential[13].

  1. What is TDM and what rights could be infringed.

According to Art. 2.a) of the Digital Single Market (DSM) Directive[14], TDM means “‘any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”.

Firstly, we must question whether this technique violates any of the exclusive rights contemplated in our legislation. TDM may also be carried out in relation to mere facts or data that are not protected by copyright and,in such cases, no authorization under copyright law is required[15]. In this respect we must make a distinction between databases and their content. While raw data as such are excluded from copyright protection, databases may be protected by copyright if the selection or arrangement of the data is original, in the sense that it is the result of creative choices[16]; or by the sui generis right, provided that its makeprovesthat there has been a substantial investment, qualitative and/or quantitative, in obtaining, verifying or presenting the content[17]. So, for the time being, there is no exclusive on the content of a database, nor, for example, on machine generated data[18]. However, the European Commission, in its 2017 Communication on “Building a European data economy”, has advanced the idea of creating at EU level a “data producer’s right” to protect industrial data erga omnes[19]. It has not been adopted so far, but we will see how it evolves and, in case it finally comes through, how it fits in with the TDM technique.

When we are dealing with a work protected by copyright we must mainly analyse whether there are acts of reproduction and adaptation, and if so, whether there are any exceptions, or whether we are dealing with an infringement. Some authors have argued that if the use made of the work for the training of an AI system does not touch any of its expressive elements, and does not imply any economic prejudice to the owners of the copyright, it should not be considered as an infringement[20]. In fact, in the United States the doctrine argues that, although the Courts have not expressly ruled on the legality of TDM without a license, this technique could be considered lawful under the doctrine of Fair use. In this sense we find some cases, such as the one concerning the Google Books Library Project[21], which suggest that the use of copyrighted works for the non-expressive purpose of forming AI models, such as for text mining and data mining, amounts to fair use[22].

In the European Union, however, the situation is different. Some are of the opinion that, according to the exceptions in copyright law, TDM tools that involve a minimum copy of a few words, or that drag through the data and process each element separately, do not constitute an infringement[23]. However, even if there are authors who have advocated a normative interpretation of the right of reproduction, which would restrict its scope to exploitative uses “of the work as the work”, and rule out non-exploitative uses, such as mining;I share the view that the EU legislator assumed that an express exception to TDM was required for reasons of legal certainty, which is found in Sections 3 and 4 of the DSM Directive[24]. Only reproductions that fall within the mandatory exception for acts of temporary reproduction provided for in Article 5(1) of Directive 2001/29/EC[25]are legitimate. The other exceptions are not adapted to TDM activities.

In the case of databases protected by the sui generis right, with the TDM the rights that could be affected are those of extraction and/or re-utilization, of the whole or of a substantial part, evaluated qualitatively and/or quantitatively, of the contents of that database[26].

  1. TDM in the DSM Directive.

As mentioned above, the DSM Directive provides for two new exemptions from TDM, in Articles 3 and 4. The first refers to TDM carried out by research organisations and cultural heritage institutions for purposes of scientific research; and the second refersto TDM in general, for all users and uses, commercial and non-commercial. However, we must take into account that there are big differences between the two articles.

Art. 3 requires that access to the work be “lawful”, which, according to Recital 14, covers the “access to content pursuant to contractual arrangements”, such as subscription and open Access,as well as the “content that is freely available online”. Then, the entities covered have access to all material available on the Internet for which we do not have to pay a subscription fee, such as the one of Facebook or YouTube[27]. In addition, any license used to disseminate such content must support TDM activities, since the exception is mandatory and cannot be waived by contract. On the other hand, Art. 3. 2 allows also the secure storage and retention of copies of mined works and other subject matter “for the purposes of scientific research, including for the verification of research results”.

Article 4, although broader in scope, contains an opt-out exemption in paragraph 3, stating that the exception applies only “on condition that right holders have not expressly reserved their rights in respect of these uses in an appropriate manner, such as machine-readable means in the case of content made publicly available online”.  To understand the term “in an appropriate manner” we should refer to Recital 18, which indicates that “it should only be considered appropriate to reserve those rights by the use of machine-readable means, including metadata and terms and conditions of a website or a service. In other cases, it can be appropriate to reserve the rights by other means, such as contractual agreements or a unilateral declaration”. Therefore, just adding robot.txt type metadata to their content online, right holders can exclude the TDM exception for commercial purposes[28].

This opt-out system, in my view, severely limits the effectiveness of this exception. Content owners can control the mining of their content, deciding whether to license or prohibit it, which is, in fact, what could be done even before the emergence of the Directive. Even if some owners decide not to monetise the mining of their content, there will be others who wish to benefit economically from this possibility, which now is explicitly recognised in the Directive[29]. This may not be seriously detrimental to the large economic platforms, which already have a large amount of content, but it couldmake it harder for European companies developing IA to compete with them[30].

  1. Conclusion.

Copyright is essential in promoting innovation, creation and culture. But nowadays, certain limitations and exceptions may also be essential in promoting the development of new technologies, and especially of AI[31].

In 2018 Japan updated its copyright law to include exemptions for the use of copyrighted works in machine learning. This reform focused on “allowing much-needed flexibility and legal certainty for innovators, and its main objective was “to promote innovative digital and AI services that are emerging or will emerge in the future, primarily by removing ambiguity for using copyrighted works for understanding and analysis”. So, for Japan, copyright cannot be an obstacle in the development of AI. Other countries, such as China, Australia, Singapore, Thailand, are taking similar steps. Furthermore, as we have seen, U.S. copyright law with the fair use doctrine is more welcoming of the general practice of TDM than European law.

While the exception in Art. 3 of the DSM Directive is a good advance, the general exception in Art. 4 has been a fail. If the EU wants to “encourage innovation”[32], “become the most attractive, secure and dynamic data economy in the world[33]” and “benefit from the potential of AI, not only as a user but also as a creator and producer of this technology”[34], such protectionist practices are inconsistent and incoherent. Since in the specific field of copyright the development of AI has not been facilitated, let´s hope that the EU at least does not retake its project of creating the “data producer’s right”, already killing many more of the possibilities we have to make progress in this sector.


[1]White Paper on Artificial Intelligence: a European approach to excellence and trust, 19 February 2020, available at:

[2]A.M. BODEN, Artificial Intelligence and Natural Man, Basic Books, 2nd ed., Sussex, 1987, p.75; A.M. BODEN, “Creativity and Artificial Intelligence”, Journal Artificial Intelligence – Special issue: artificial intelligence 40 years later archive, Volume 103 Issue 1-2, 1998,p. 347- 356.

[3]UCHIMURA, “La mini-novela japonesa escrita por un robot que casi se gana un premio literario”, Hanabi,2016,

[4]D.A. FERRUCI, “Artificial Intelligence and Literary Creativity: Inside the Mind of BRUTUS, a Storytelling Machine”, Rensselaer Polytechic Institute and IBM T.J. Watson Research Center, 2010,





[9]A. LÓPEZ-TARRUELLA (2019) “La excepción de minería de textos y datos: una oportunidad perdida”, LVCENTINVS, available at:

[10]F. PUGLIESE and M. TESTI (2017), “Una panoramica introduttiva su Deep Learning e Machine Learning”, Machine Learning IT, available at:

[11]J. ROCCA (2019), “Understanding Generative Adversarial Networks (GANs)”, Towards Data Science”, available at:

[12]Some interesting readings on this topic are: A. RAMALHO, (2017) “Will robots rule the (artistic) world? A proposed model for the legal status of creations by artificial intelligence systems”, Forthcoming in the Journal of Internet Law; D.GERVAIS (2019), “The Machine as Author” Iowa Law Review, Vol. 105; J.C. GINSBURG and A.L. BUDIARDJO (2019), “Authors and Machines”, Columbia Public Law Research Paper No. 14-597, Berkeley Technology Law Journal, Vol. 34, No. 2, available at:

[13]K. ROBINSON (2020), “Copyrights in the Era of AI”, Adobe blogs, available at:

[14]Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC, ELI:

[15]Recital 9 DSM Directive.

[16]Art. 2.5, Berne Convention for the Protection of Literary and Artistic Works, 1886; Art. 10 T TRIPs Agreement, 1994; Art. 5 WIPO Copyright Treaty, 1996; Art. 3. Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, ELI:; Football Association Premier League Ltd and Others v QC Leisure and Others; and Karen Murphy v Media Protection Services Ltd , ECJ 4 October 2011, joined cases C-403/08 and C-429/08, ECR [2011] I-9083.

[17]Art. 7 EU Database Directive

[18]B. HUGENHOLTZ (2019), “Introducing a property right over data in the EU: the data producer’s right – an evaluation”, International Review of Law Computers & Technology,DOI: 10.1080/13600869.2019.1631621.

[19]European Commission, ‘Building A European Data Economy’, Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions, 10 January 2017, COM(2017), available at:

[20]M. SAG (2009), “Copyright and copy-reliant technology”, Northwestern University Law Review, Vol. 103 (4), pp. 1607-168

[21]Authors Guild v Google, Inc, No. 13-4829 (2d Cir. 2015), affirming Authors Guild v Google, Inc, 954 F.Supp.2d 282 (2013).

[22]E. ROSATI (2019), “Copyright as an Obstacle or an Enabler? A European Perspective on Text and Data Mining and Its Role in the Development of AI Creativity”, Asia Pacific Law Review, pp. 15-16, available at:

[23]D. SCHÖNBERGER (2018), Deep Copyright: Up- and Downstream- Questions Related to Artificial Intelligence (AI) and Machine Learning (ML), Droit d’auteur 4.0 / Copyright 4.0, DE WERRA Jacques (ed.),Geneva / Zurich (Schulthess Editions Romandes, pp.145-173.

[24]B. HUGENHOLTZ (2019), “The New Copyright Directive: Text and Data Mining (Articles 3 and 4)”, Kluwer Copyright Blog, available at:

[25]Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, DOUE-L-2001-81549

[26]Art. 7 EU Database Directive

[27]A. LÓPEZ-TARRUELLA (2019),supra.

[28]B. HUGENHOLTZ (2019), supra, p.11,

[29]E. ROSATI (2019), supra, pp. 6, 7.

[30]C. GERRISH and A. MOLANDER SKAVLAN (2019), “European copyright law and the text and data mining exceptions and limitations, In light of the recent DSM Directive, is the EU approach a hindrance or facilitator to innovation in the region?, Stockholm Intellectual Property Law Review, ISSN 2003-2382.

[31]F. VON LOHMANN (2016), “Google on what is driving creativity and innovation in the digital economy”, WIPO Magazine, Available at:

[32]White Paper on Artificial Intelligence, supra, p. 3



Share this article!

About Author

Leave A Reply