9 min read
9 min read

Reddit sued Perplexity, Oxylabs UAB, AWM Proxy, and SerpApi in Manhattan federal court, alleging they scraped Reddit via Google results and resold the content in AI products and datasets commercially.
Reddit argues the scraping damaged its value because the site has been building a business around search traffic and proprietary content.
The company told the court it has invested tens of millions of dollars in anti-scraping systems and plans to defend the value created by its users and its investments in better native search and AI partnerships.

According to the lawsuit, Reddit sent Perplexity a cease and desist letter in May 2024 demanding the company stop scraping Reddit unless it negotiated a deal, similar to arrangements Reddit struck with Google and OpenAI.
Reddit frames the letter as an attempt to be reasonable, offering commercial terms used by other large AI players.
The company now says Perplexity either ignored the demand or used workarounds, and Reddit is using the court process to force accountability and protect its business model as search competition heats up.

Reddit says Perplexity’s Reddit citations jumped fortyfold after a May 2024 cease-and-desist, indicating scraping continued and intensified despite blocks.
The filing claims that defendants used increasingly sophisticated methods to collect Reddit content without permission.
The lawsuit uses that data point to support claims of willful misconduct and to show the scope of the alleged harm. Reddit contends that the increased scraping undercuts its efforts to monetize search traffic and to control how its community’s work is reused by for-profit AI products.

Reddit says it planted a Google-only test post, a digital marked bill, to catch scrapers. The post’s contents soon appeared in Perplexity’s answers, suggesting Perplexity or a partner used Google results to copy Reddit material despite access controls.
That experiment is central to Reddit’s argument that the scraping was not accidental but the result of intentional circumvention.
By showing the post surfaced where it should not, Reddit aims to demonstrate concrete evidence that content meant only for search engine indexing was extracted and republished by third-party tools.

Perplexity said it doesn’t use Reddit content to train models and would respect Reddit’s robots.txt, per the lawsuit and press reports.
It supports open access to public knowledge and provides principled, responsible, factual answers without misusing user-generated content in products or breaching policies.
Even so, Reddit’s filing says citations increased sharply after the warning, and the company alleges Perplexity worked with third-party scrapers to ingest Reddit posts into other AI products.
That disagreement over practice and intent sits at the heart of the litigation and will likely be a major point in court discovery.

Reddit claims Perplexity’s model depended on pulling Reddit content from Google results, feeding it into a third-party LLM, and selling the output as a new product. It also says Oxylabs and SerpApi supplied scraping tools enabling large-scale Reddit data collection.
The lawsuit frames those relationships as part of a market where scraped public content is sold to eager buyers training AI systems.
If the court agrees that scraping vendors and AI firms knowingly worked together, it could change how platforms and tool providers interact and how courts treat publicly facing data extracted via intermediaries.

The suit and subsequent coverage describe AWMProxy as a former Russian botnet service; AWMProxy could not be reached for comment.
Reddit casts defendants as illegal scrapers bypassing protections and selling data for AI training, bolstering claims of harm and unfair competition. Other named providers, like Oxylabs and SerpApi, have said they will defend themselves.
The presence of widely used scraping services in the complaint highlights a broader industry tension between platforms that want to control their content and vendors who build tools to harvest public web pages at scale.

Reddit’s lawsuit quotes a social media post from Cloudflare CEO Matthew Prince, who compared Perplexity to North Korean hackers for attempting to conceal web crawling activity.
That public criticism adds reputational pressure and signals that other infrastructure and security leaders view aggressive scraping tactics as unacceptable and harmful to the web ecosystem.
By including that quote, Reddit wants to show industry concern about opaque crawling practices and to underscore its argument that the defendants took steps to hide their behavior.
The reference will likely be used to support claims that the scraping was intentional and designed to evade safeguards.

Reddit says it has spent tens of millions of dollars over several years on anti-scraping systems to protect user content and platform integrity.
Those investments reflect how seriously the company treats unauthorized mass data collection and support its claim that scraping causes financial and strategic harm to its business efforts.
The expense of defending against scrapers has become part of Reddit’s rationale for seeking damages and injunctive relief.
As platforms invest more in protection, they may press courts to recognize the costs and to limit third parties that harvest content for commercial gain without permission.

Reddit said it is trying to become a true search destination by turning weekly user intent into activity on its native platform.
The company highlighted in its Q2 report that Reddit offers a unique breadth of conversations and human advice that can make the site a valuable search resource, distinct from other indexed results.
That strategy helps explain why Reddit objects to third parties mining its content, since building search value requires control over how posts appear in results and how that content is monetized.
The lawsuit can be read as part of a broader effort to protect search-related revenue and user engagement.

Reddit’s March 2024 deal with Google granted Google licensed access to Reddit content for training, while Reddit gained Vertex AI tools for search and features, showing Reddit prefers paid, controlled partnerships with major AI firms over unrestricted scraping.
By pointing to that deal, Reddit contrasts companies that made agreements with those it accuses of taking content without permission.
The comparison supports Reddit’s argument that reasonable commercial arrangements already exist and that some firms opted to bypass negotiation and protections instead of paying for access.

Reports in September 2025 indicated Perplexity secured new funding at a ~$20 billion valuation. The complaint says Perplexity relied on third-party scraped material to power answers and services.
While refusing a paid deal like larger partners, framing an imbalance where AI startups capture commercial benefits without compensating the platforms hosting the content.
The claim raises broader questions about how value is shared between platforms, users, and companies that build AI services on top of public web material.

Reddit warns that scraping undermines its effort to convert search traffic into active platform users and to monetize that attention.
The company says its community conversations are a unique asset and that unauthorized reuse of those conversations weakens Reddit’s ability to grow native search and deliver value back to users and advertisers.
If Reddit wins stronger protections, it could control how its content appears in AI answers and search, which might preserve user privacy and platform revenue.
At the same time, tougher limits on scraping could change how many AI tools source publicly available information for training and answers.

Reddit seeks legal remedies that could include monetary damages and injunctions to block further scraping.
The complaint argues that defendants knowingly bypassed technical protections, sold scraped data, and contributed to unfair competition, which forms the basis for the requested relief in federal court.
How the court interprets the law on scraping, agreements like robots.txt, and the responsibilities of intermediaries will help determine the outcome.
A ruling for Reddit could strengthen platform control, while a different result could leave scraping vendors and AI firms with more leeway.

If Reddit wins, platforms could more easily block unauthorized scraping and seek compensation for commercial reuse, spurring more licensing deals like Reddit–Google and giving platforms leverage in negotiations with AI firms needing large training datasets.
Conversely, a ruling against Reddit could weaken platforms’ technical protections and legitimize broader third-party data harvesting, potentially shaping the business models of content hosts and AI companies, and redefining how online data is collected and monetized.
If you’ve ever wondered how far ad disputes can go, don’t miss Google faces heat as OpenX sues over unfair ad tactics.

The case could push firms toward formal licensing agreements or force developers to adopt transparent, permission-based collection methods that respect platform rules and the investments companies make to protect conversations.
Alternatively, the dispute could expose gaps in web-scraping law and spur calls for clearer rules. Either way, platform owners, AI companies, and policymakers will watch as they try to balance innovation with creators’ rights.
If you’ve ever wondered what happens when investors turn on a CEO, don’t miss $8 billion privacy lawsuit that pits Meta investors against Mark Zuckerberg.
Will Reddit’s action change how AI firms source training data and how platforms protect user content? Share your thoughts in the comments.
More From This Brand:
Don’t forget to follow us for more exclusive content right here on MSN.
This slideshow was made with AI assistance and human editing.
This content is exclusive for our subscribers.
Get instant FREE access to ALL of our articles.
Father, tech enthusiast, pilot and traveler. Trying to stay up to date with all of the latest and greatest tech trends that are shaping out daily lives.
We appreciate you taking the time to share your feedback about this page with us.
Whether it's praise for something good, or ideas to improve something that
isn't quite right, we're excited to hear from you.
Stay up to date on all the latest tech, computing and smarter living. 100% FREE
Unsubscribe at any time. We hate spam too, don't worry.

Lucky you! This thread is empty,
which means you've got dibs on the first comment.
Go for it!