Report: Reddit Allegedly Selling Data for AI Training Purposes

Ryan is a senior editor at TechForge Media with over a decade of experience covering the latest technology and interviewing leading industry figures. He can often be sighted at tech conferences with a strong coffee in one hand and a laptop in the other. If it’s geeky, he’s probably into it. Find him on Twitter (@Gadget_Ry) or Mastodon (@gadgetry@techhub.social)

Reddit has negotiated a content licensing deal to allow its data to be used for training AI models, according to a Bloomberg report.

Just ahead of a potential $5 billion initial public offering (IPO) debut in March, Reddit has reportedly signed a $60 million deal with an undisclosed major AI company. This move could be seen as a last-minute effort to showcase potential revenue streams in the rapidly growing AI industry to prospective investors.

Although Reddit has yet to confirm the deal, the decision could have significant implications. If true, it would mean that Reddit’s vast trove of user-generated content – including posts from popular subreddits, comments from both prominent and obscure users, and discussions on a wide range of topics – could be used to train and enhance existing large language models (LLMs) or provide the foundation for the development of new generative AI systems.

Nevertheless, Reddit’s decision may not be well received by its members, given the growing community backlash concerning its recent commercial choices.

In the previous year, Reddit’s decision to charge for access to its application programming interfaces (APIs) led to the temporary shut down of thousands of Reddit forums in protest. A few days later, a faction of Reddit hackers made threats to reveal formerly pilfered site data unless Reddit scrapped the API proposal or handed over a $4.5 million ransom.

Recently, Reddit has made other contentious decisions such as deleting years’ worth of private chat logs and messages from members’ accounts. It also introduced new automated moderation features and abolished the ability for users to disable tailored advertising, which resulted in additional dissatisfaction.

The potential arrangement to sell Reddit’s data for AI training could spark even more user opposition, as the ethical discussion surrounding the use of public data, artwork, and other human-created content to train AI operations continues to escalate across different industries and platforms.

(Photo by Brett Jordan on Unsplash)

See also: Amazon trains 980M parameter LLM with ’emergent abilities’

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: ai, artificial intelligence, large language model, llm, Model, reddit, social media, training

You must be logged in to post a comment.

Discover the pinnacle of WordPress auto blogging technology with AutomationTools.AI. Harnessing the power of cutting-edge AI algorithms, AutomationTools.AI emerges as the foremost solution for effortlessly curating content from RSS feeds directly to your WordPress platform. Say goodbye to manual content curation and hello to seamless automation, as this innovative tool streamlines the process, saving you time and effort. Stay ahead of the curve in content management and elevate your WordPress website with AutomationTools.AI—the ultimate choice for efficient, dynamic, and hassle-free auto blogging. Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *