Perplexity wants to change how we use the internet, but the AI search startup backed by Jeff Bezos might be breaking its rules to do so. The company appears to be ignoring a widely accepted web standard, the Robots Exclusion Protocol, to scrape parts of the web that operators don’t want to be accessed by bots, according to a report from developer Robb Knight this week that was confirmed by Wired.
Perplexity’s service summarizes articles on the web, claiming to deliver “reliable answers” with “no need to click on different links,” as noted in a blog post. In order to do that, Wired and Knight found that Perplexity ignores code (robots.txt files) deliberately written to block web crawlers. The two reports found that Perplexity uses an unlisted IP address to circumnavigate these robots.txt files and scrape the websites anyway. Wired claims its website blocked Perplexity’s web crawler earlier in 2024, but the AI search engine is still capable of summarizing its articles in detail.
Despite this, Perplexity claims to respect the Robots Exclusion Protocol in documentation on its website. Perplexity CEO Aravind Srinivas told Wired the reporters had “a deep and fundamental misunderstanding of how Perplexity and the Internet work,” but did not dispute the findings directly. Gizmodo reached out to Perplexity to ask for a more detailed response and will update the article if we hear back.
Separately, Perplexity is currently facing legal threats for breaking some other internet rules: copyright infringement. Forbes reportedly threatened legal action against Perplexity this week, after accusing the AI startup of ripping off Forbes reporting without proper attribution. Forbes had done original reporting on former Google CEO Eric Schmidt’s AI drone venture, and Perplexity created AI-generated articles, podcasts, and videos using Forbes’ text and images. The executive editor of Forbes called out Perplexity on X earlier in the month.
Our reporting on Eric Schmidt’s stealth drone project was posted this AM by @perplexity_ai . It rips off most of our reporting. It cites us, and a few that reblogged us, as sources in the most easily ignored way possible. Note the views. #zeroclick https://t.co/qZamti9E83 pic.twitter.com/8z2AsyHjgM
— John Paczkowski (@JohnPaczkowski) June 7, 2024
Perplexity’s product, though useful, reroutes traffic on the internet. Google also indexes webpages and offers short AI summaries, but it points traffic directly toward the web pages the information comes from. Perplexity effectively is writing detailed AI articles, making it so users won’t click through to websites, which breaks the business model of digital media.
OpenAI has forged partnerships with media companies to address this, paying them upfront to license content, and Perplexity is reportedly working on similar content partnerships, but instead of paying a flat fee for content like OpenAI, Perplexity aimed to share revenue. But these partnerships don’t exist yet, so for now, Perplexity appears to be jumping paywalls and scraping websites to take all the information it needs to power its AI answers.