Was this helpful?
Thumbs UP Thumbs Down

Reddit sues Perplexity, says AI firm stole data

Perplexity logo displayed on phone
Girl holding mobile displaying reddit logo with background showing reddit logo

Reddit sues Perplexity and others

Reddit sued Perplexity, Oxylabs UAB, AWM Proxy, and SerpApi in Manhattan federal court, alleging they scraped Reddit via Google results and resold the content in AI products and datasets commercially.

Reddit argues the scraping damaged its value because the site has been building a business around search traffic and proprietary content.

The company told the court it has invested tens of millions of dollars in anti-scraping systems and plans to defend the value created by its users and its investments in better native search and AI partnerships.

Perplexity logo displayed on phone

Cease and desist ignored, Reddit says

According to the lawsuit, Reddit sent Perplexity a cease and desist letter in May 2024 demanding the company stop scraping Reddit unless it negotiated a deal, similar to arrangements Reddit struck with Google and OpenAI.

Reddit frames the letter as an attempt to be reasonable, offering commercial terms used by other large AI players.

The company now says Perplexity either ignored the demand or used workarounds, and Reddit is using the court process to force accountability and protect its business model as search competition heats up.

Data word made with scrabble letters

Citations surged forty fold after notice

Reddit says Perplexity’s Reddit citations jumped fortyfold after a May 2024 cease-and-desist, indicating scraping continued and intensified despite blocks.

The filing claims that defendants used increasingly sophisticated methods to collect Reddit content without permission.

The lawsuit uses that data point to support claims of willful misconduct and to show the scope of the alleged harm. Reddit contends that the increased scraping undercuts its efforts to monetize search traffic and to control how its community’s work is reused by for-profit AI products.

Businesspersons hand analyzing invoice through magnifying glass in office.

Test post acted like marked bill

Reddit says it planted a Google-only test post, a digital marked bill, to catch scrapers. The post’s contents soon appeared in Perplexity’s answers, suggesting Perplexity or a partner used Google results to copy Reddit material despite access controls.

That experiment is central to Reddit’s argument that the scraping was not accidental but the result of intentional circumvention.

By showing the post surfaced where it should not, Reddit aims to demonstrate concrete evidence that content meant only for search engine indexing was extracted and republished by third-party tools.

Perplexity AI logo on screen

Perplexity says it respects robots.txt

Perplexity said it doesn’t use Reddit content to train models and would respect Reddit’s robots.txt, per the lawsuit and press reports.

It supports open access to public knowledge and provides principled, responsible, factual answers without misusing user-generated content in products or breaching policies.

Even so, Reddit’s filing says citations increased sharply after the warning, and the company alleges Perplexity worked with third-party scrapers to ingest Reddit posts into other AI products.

That disagreement over practice and intent sits at the heart of the litigation and will likely be a major point in court discovery.

Man writing 'Third Party' on transparent screen.

Allegation of third party scraping use

Reddit claims Perplexity’s model depended on pulling Reddit content from Google results, feeding it into a third-party LLM, and selling the output as a new product. It also says Oxylabs and SerpApi supplied scraping tools enabling large-scale Reddit data collection.

The lawsuit frames those relationships as part of a market where scraped public content is sold to eager buyers training AI systems.

If the court agrees that scraping vendors and AI firms knowingly worked together, it could change how platforms and tool providers interact and how courts treat publicly facing data extracted via intermediaries.

Botnet concept with faceless hooded male person

AWM Proxy labeled former botnet

The suit and subsequent coverage describe AWMProxy as a former Russian botnet service; AWMProxy could not be reached for comment.

Reddit casts defendants as illegal scrapers bypassing protections and selling data for AI training, bolstering claims of harm and unfair competition. Other named providers, like Oxylabs and SerpApi, have said they will defend themselves.

The presence of widely used scraping services in the complaint highlights a broader industry tension between platforms that want to control their content and vendors who build tools to harvest public web pages at scale.

Cloudflare logo displayed on laptop screen

Cloudflare chief called out Perplexity

Reddit’s lawsuit quotes a social media post from Cloudflare CEO Matthew Prince, who compared Perplexity to North Korean hackers for attempting to conceal web crawling activity.

That public criticism adds reputational pressure and signals that other infrastructure and security leaders view aggressive scraping tactics as unacceptable and harmful to the web ecosystem.

By including that quote, Reddit wants to show industry concern about opaque crawling practices and to underscore its argument that the defendants took steps to hide their behavior.

The reference will likely be used to support claims that the scraping was intentional and designed to evade safeguards.

A wooden law gavel on US dollar money background

Reddit says it invested tens of millions in anti scraping

Reddit says it has spent tens of millions of dollars over several years on anti-scraping systems to protect user content and platform integrity.

Those investments reflect how seriously the company treats unauthorized mass data collection and support its claim that scraping causes financial and strategic harm to its business efforts.

The expense of defending against scrapers has become part of Reddit’s rationale for seeking damages and injunctive relief.

As platforms invest more in protection, they may press courts to recognize the costs and to limit third parties that harvest content for commercial gain without permission.

Spotlight search.

Reddit positions itself for search

Reddit said it is trying to become a true search destination by turning weekly user intent into activity on its native platform.

The company highlighted in its Q2 report that Reddit offers a unique breadth of conversations and human advice that can make the site a valuable search resource, distinct from other indexed results.

That strategy helps explain why Reddit objects to third parties mining its content, since building search value requires control over how posts appear in results and how that content is monetized.

The lawsuit can be read as part of a broader effort to protect search-related revenue and user engagement.

Google headquarter in California.

Google partnership and deal context

Reddit’s March 2024 deal with Google granted Google licensed access to Reddit content for training, while Reddit gained Vertex AI tools for search and features, showing Reddit prefers paid, controlled partnerships with major AI firms over unrestricted scraping.

By pointing to that deal, Reddit contrasts companies that made agreements with those it accuses of taking content without permission.

The comparison supports Reddit’s argument that reasonable commercial arrangements already exist and that some firms opted to bypass negotiation and protections instead of paying for access.

Side view of hooded hacker in mask counting stolen money

Perplexity valuation and business model

Reports in September 2025 indicated Perplexity secured new funding at a ~$20 billion valuation. The complaint says Perplexity relied on third-party scraped material to power answers and services.

While refusing a paid deal like larger partners, framing an imbalance where AI startups capture commercial benefits without compensating the platforms hosting the content.

The claim raises broader questions about how value is shared between platforms, users, and companies that build AI services on top of public web material.

A wooden blocks with the word impact written on it

Impact on Reddit search and users

Reddit warns that scraping undermines its effort to convert search traffic into active platform users and to monetize that attention.

The company says its community conversations are a unique asset and that unauthorized reuse of those conversations weakens Reddit’s ability to grow native search and deliver value back to users and advertisers.

If Reddit wins stronger protections, it could control how its content appears in AI answers and search, which might preserve user privacy and platform revenue.

At the same time, tougher limits on scraping could change how many AI tools source publicly available information for training and answers.

A court gavel on US 100 Dollar bills

Legal stakes include damages and control

Reddit seeks legal remedies that could include monetary damages and injunctions to block further scraping.

The complaint argues that defendants knowingly bypassed technical protections, sold scraped data, and contributed to unfair competition, which forms the basis for the requested relief in federal court.

How the court interprets the law on scraping, agreements like robots.txt, and the responsibilities of intermediaries will help determine the outcome.

A ruling for Reddit could strengthen platform control, while a different result could leave scraping vendors and AI firms with more leeway.

Gavel on desk with judge working at courtroom.

What a win for Reddit could mean

If Reddit wins, platforms could more easily block unauthorized scraping and seek compensation for commercial reuse, spurring more licensing deals like Reddit–Google and giving platforms leverage in negotiations with AI firms needing large training datasets.

Conversely, a ruling against Reddit could weaken platforms’ technical protections and legitimize broader third-party data harvesting, potentially shaping the business models of content hosts and AI companies, and redefining how online data is collected and monetized.

If you’ve ever wondered how far ad disputes can go, don’t miss Google faces heat as OpenX sues over unfair ad tactics.

What do you think?

What do you make of this

The case could push firms toward formal licensing agreements or force developers to adopt transparent, permission-based collection methods that respect platform rules and the investments companies make to protect conversations.

Alternatively, the dispute could expose gaps in web-scraping law and spur calls for clearer rules. Either way, platform owners, AI companies, and policymakers will watch as they try to balance innovation with creators’ rights.

If you’ve ever wondered what happens when investors turn on a CEO, don’t miss $8 billion privacy lawsuit that pits Meta investors against Mark Zuckerberg.

Will Reddit’s action change how AI firms source training data and how platforms protect user content? Share your thoughts in the comments.

More From This Brand:

Don’t forget to follow us for more exclusive content right here on MSN.

If you like this story, you’ll LOVE our Free email newsletter. Join today and be the first to receive stories like these.

This slideshow was made with AI assistance and human editing.

This content is exclusive for our subscribers.

Get instant FREE access to ALL of our articles.

Was this helpful?
Thumbs UP Thumbs Down
Prev Next
Share this post

Lucky you! This thread is empty,
which means you've got dibs on the first comment.
Go for it!

Send feedback to ComputerUser



    We appreciate you taking the time to share your feedback about this page with us.

    Whether it's praise for something good, or ideas to improve something that isn't quite right, we're excited to hear from you.