The Implications of OpenAI’s Deals with Publishers for Competitors
OpenAI’s legal battle with The New York Times about data for training its AI models is ongoing. However, OpenAI is pressing forward with agreements with other publishers, such as large news outlets from France and Spain.
On Wednesday, OpenAI announced it formed agreements with Le Monde and Prisa Media to introduce French and Spanish news material to OpenAI’s ChatGPT chatbot. According to an OpenAI blog post, the collaborations will showcase the publishers’ current event coverage — from brands like El País, Cinco Días, As and El Huffpost — to ChatGPT users and further enhance OpenAI’s growing reservoir of training data.
The statement from OpenAI reads:
Over the following months, ChatGPT users can interact with applicable news material from these publishers through chosen summaries, complete with attribution and enhanced links to the original articles. This provides users the opportunity to access extra information or related articles from the publishers’ news sites. We are constantly improving ChatGPT and reinforcing the critical role of the news industry in providing real-time, accurate information to users.
OpenAI has divulged licensing agreements with various content providers so far. It seems appropriate to evaluate the situation:
The undisclosed figures that OpenAI is doling out to each publisher can be speculated on. The Information reported earlier in January that OpenAI was offering to pay publishers a range of $1 million to $5 million per annum for access to their archives to train GenAI models. Even though this doesn’t provide information about the Shutterstock partnership, we can make inferences about article licensing. If we consider the report from The Information accurate and the figures being the same as before, OpenAI seems to be spending between $4 million and $20 million per year for news content.
For OpenAI, this amount may look insignificant against its huge treasury of over $11 billion and annual revenue crossing the $2 billion milestone as per Financial Times. However, as suggested by Hunter Walk, from Homebrew and the co-founder of Screendoor, the figure is significant enough to potentially outmaneuver AI competitors also involved in similar licensing deals.
Walk writes on his blog:
If experimentation is gated by nine figures worth of licensing deals, we are doing a disservice to innovation … The checks being cut to ‘owners’ of training data are creating a huge barrier to entry for challengers. If Google, OpenAI, and other large tech companies can establish a high enough cost, they implicitly prevent future competition.
Now, whether there’s a barrier to entry today is debatable. Many — if not most — AI vendors have chosen to risk the wrath of IP holders, opting not to license the data on which they’re training AI models. There’s evidence that art-generating platform Midjourney, for example, is training on Disney movie stills — and Midjourney has no deal with Disney.
The tougher question to wrestle with is: Should licensing simply be the cost of doing business and experimentation in the AI space?
Walk suggests that a regulator-approved “safe harbor” might provide legal immunity for all AI vendors, minuscule startups, and researchers, provided they maintain specific transparency and ethical norms.
Recently, the U.K. attempted to enact something similar, offering exemptions for the usage of text and data mining for AI training from copyright rules if the purpose is research. Nonetheless, these endeavors were unsuccessful.
I am uncertain whether I would support Walk’s “safe harbor” suggestion to the extent he does, bearing in mind the potential effect AI could have on an already unstable news sector. A recent study from The Atlantic discovered that if a search engine like Google were to incorporate AI into search, it would likely respond to a user’s query 75% of the time without necessitating a click-through to its site.
Nevertheless, there might be some opportunity for exceptions.
Publishers should receive fair compensation. Could there be a scenario whereby they receive pay and the same data access is granted to AI challengers and academic circles, just like the incumbents? Certainly, this could be possible. Awards and larger venture capital funding may be the solutions.
While I have no concrete solution, especially as legal judgments are pending on whether AI providers are shielded from copyright claims through fair usage, it is imperative to unravel these complexities. If left unresolved, there is a risk of an industry scenario where academic “brain drain” prevails, leaving just a handful of powerful corporations with privileged access to rich, valuable training data pools.
Discover the pinnacle of WordPress auto blogging technology with AutomationTools.AI. Harnessing the power of cutting-edge AI algorithms, AutomationTools.AI emerges as the foremost solution for effortlessly curating content from RSS feeds directly to your WordPress platform. Say goodbye to manual content curation and hello to seamless automation, as this innovative tool streamlines the process, saving you time and effort. Stay ahead of the curve in content management and elevate your WordPress website with AutomationTools.AI—the ultimate choice for efficient, dynamic, and hassle-free auto blogging. Learn More