In every hype cycle, certain patterns of deception emerge. With the last cryptocurrency boom, it was “Ponzinomics” and “rug-pull.” With self-driving cars, it was “5 years away!” With AI, it’s a question of how much unethical behavior you can get away with.
The confusion is essentially a middleman targeting quality sources.
Perplexity, which is in talks to raise hundreds of millions of dollars, is trying to build a competitor to Google Search. But instead of a “search engine,” Perplexity is building an “answer engine.” The idea is that instead of using primary sources to comb through a ton of results to answer your question, you simply get the answer that Perplexity finds for you. “Facts and accuracy are what we care about,” Perplexity CEO Aravind Srinivas told The Verge.
So Perplexity is basically a middleman competing for the interests of high quality sources. The original value proposition in search was that by scraping the work done by journalists and others, Google’s results would send traffic to those sources. But these so-called “answer engines” are starving primary sources of advertising revenue by providing answers rather than directing people to click through to the primary source, and hog that revenue for themselves. Perplexity belongs to a group of vampires that also includes Arc Search and Google itself.
But Perplexity goes a step further with its Pages product, creating summary “reports” based on primary sources — not just quoting a sentence or two to directly answer a user’s question, but creating entire aggregated articles — and it’s accurate in the sense that it actively plagiarizes the sources it uses.
Forbes found that Perplexity had circumvented the magazine’s paywall to publish a summary of the magazine’s investigation into former Google CEO Eric Schmidt’s drone company. While Forbes has a metered paywall on some articles, premium articles like the investigation are protected by a strict paywall. Not only did Perplexity manage to get around the paywall, it barely cited the original investigation and plagiarized the original art to use in its reporting. (For those keeping track at home, anything involving art is copyright infringement.)
“Someone else did it” is a good excuse for a 5-year-old.
While aggregation isn’t a particularly new phenomenon, the scale at which Perplexity is able to aggregate and the piracy caused by its use of original art is pretty, well, surprising. In an attempt to calm everyone down, the company’s chief business officer went to Axios to say that Perplexity is developing a publication and revenue-sharing plan, but, man, why is everyone being so mean about a product that’s still in development?
At this point, Wired stepped in and confirmed Robb Knight’s findings. Perplexity’s scraping of Forbes articles was not an anomaly. In fact, Perplexity was ignoring the robots.txt code that explicitly asks web crawlers not to scrape the pages. Srinivas responded in Fast Company that Perplexity wasn’t actually ignoring robots.txt, but was simply using a third-party scraper that did. Srinivas refused to reveal the name of the third-party scraper, and wouldn’t say whether he asked the crawler not to violate robots.txt.
“Someone did it” is a fine argument for a 5-year-old. And let’s think about that response further. If Srinivas wanted to act ethically, he had a few options here. Option 1: terminate his contract with the third-party scraper. Option 2: convince the scraper to respect robots.txt. Srinivas didn’t commit to either, and it seems to me there’s a clear reason for that: even if Perplexity itself doesn’t violate the code, its “answer engine” relies on someone else violating the code in order to function.
To make matters worse, Perplexity plagiarized an article from Wired, even though Wired had explicitly blocked Perplexity in a text file. While most of the Wired article on plagiarism is about legal remedies, I’m interested in what’s going on with robots.txt. It’s a good-faith agreement that’s been in place for decades, but it’s falling apart as unscrupulous AI companies (yes, not just Perplexity) suck up whatever they can get their hands on to train bullshit models. Remember when Srinivas said he was committed to being “fact-based”? I’m not sure that’s true either. As Forbes reports, Perplexity is now surfacing AI-generated results and actual misinformation.
To my ears, it sounded like Srinivas was boasting about how charming and clever his lies were.
Many big AI companies have engaged in questionable and unethical practices to get the data they want. To prove Perplexity’s value to investors, Srinivas built a tool to scrape Twitter, posing as an academic researcher with API access for research purposes. “I was [fake academic] “Projects like BrinRank and all that kind of stuff,” Srinivas said on Lex Fridman’s podcast. I assume “BrinRank” is a reference to Google co-founder Sergey Brin, but it sounded to me like Srinivas was bragging about how fascinating and clever his lies were.
It’s not me, the CEO, who is saying that Perplexity’s foundation is lies to circumvent established principles that underpin the web. This makes the actual value proposition of the “answer engine” clear: Perplexity cannot generate real information on its own, but instead relies on third parties that abuse its policies. The “answer engine” was developed by people who are willing to lie whenever it suits them, and that preference is necessary for Perplexity to work.
So, this is the true innovation of Perplexity: it has destroyed the foundation of trust that the Internet was built on. The question is, will users and investors care?