AI Detectors Get It Wrong. Writers Are Being Fired Anyway

Kimberly Gasuras doesn’t use AI. “I don’t need it,” she said. “I’ve been a news reporter for 24 years. How do you think I did all that work?” That logic wasn’t enough to save her job.

As a local journalist in Bucyrus, Ohio, Gasuras relies on side hustles to pay the bills. For a while, she made good money on a freelance writing platform called WritersAccess, where she wrote blogs and other content for small and midsize companies. But halfway through 2023, the income plummeted as some clients switched to ChatGPT for their writing needs. It was already a difficult time. Then the email came.

“I only got one warning,” Gasuras said. “I got this message saying they’d flagged my work as AI using a tool called ‘Originality.’” She was dumbfounded. Gasuras wrote back to defend her innocence, but she never got a response. Originality costs money, but Gasuras started running her work through other AI detectors before submitting to make sure she wasn’t getting dinged by mistake. A few months later, WritersAccess kicked her off the platform anyway. “They said my account was suspended due to excessive use of AI. I couldn’t believe it,” Gasuras said. WritersAccess did not respond to a request for comment.

When ChatGPT set the world on fire a year and a half ago, it sparked a feverish search for ways to catch people trying to pass off AI text as their own writing. A host of startups launched to fill the void through AI detection tools, with names including Copyleaks, GPTZero, Originality.AI, and Winston AI. It makes for a tidy business in a landscape full of AI boogeymen.

Watch: Florida Family Takes NASA to Court Over Home Damaged by Space Trash

These companies advertise peace of mind, a way to take back control through “proof” and “accountability.” Some advertise accuracy rates as high as 99.98%. But a growing body of experts, studies, and industry insiders argue these tools are far less reliable than their makers promise. There’s no question that AI detectors make frequent mistakes, and innocent bystanders get caught in the crossfire. Countless students have been accused of AI plagiarism, but a quieter epidemic is happening in the professional world. Some writing gigs are drying up thanks to chatbots. As people fight over the dwindling field of work, writers are losing jobs over false accusations from AI detectors.

“This technology doesn’t work the way people are advertising it,” said Bars Juhasz, co-founder of Undetectable AI, which makes tools to help people humanize AI text to sneak it past detection software. “We have a lot of concerns around the reliability of the training process these AI detectors use. These guys are claiming they have 99% accuracy, and based on our work, I think that’s impossible. But even if it’s true, that still means for every 100 people there’s going to be one false flag. We’re talking about people’s livelihoods and their reputations.”

Safeguard, or snake oil?

In general, AI detectors work by spotting the hallmarks of AI penmanship, such as perfect grammar and punctuation. In fact, it seems one of the easiest ways to get your work flagged is to use Grammarly, a tool that checks for spelling and grammatical errors. It even suggests ways to rewrite sentences using, you guessed it, artificial intelligence. Adding insult to injury, Gizmodo spoke to writers who said they were fired by platforms that required them to use Grammarly. (Gizmodo confirmed the details of these stories, but we are excluding the names of certain freelance platforms because writers signed non-disclosure agreements.)

Writers, experts, and even AI detection companies themselves said that using Grammarly can get your writing flagged as AI-generated. However, Jenny Maxwell, Grammarly’s head of education, disputed those claims. “There is no evidence linking AI detection flags and the use of Grammarly suggestions. Suggestions like our clarity rewrites are not powered by generative AI,” Maxwell said. Grammarly does offer generative AI tools that write content from scratch, though these suggestions don’t appear automatically. These features “should and would” trigger AI detection, she said.

Detectors look for more telling factors as well, such as “burstiness.” Human writers are more likely to reuse certain words in clusters or bursts, while AI is more likely to distribute words evenly across a document. AI detectors can also assess “perplexity,” which essentially asks an AI to measure the likelihood that it would have produced a piece of text given the model’s training data. Some companies, such as industry leader Originaility.AI, train their own AI language models specially made to detect the work of other AIs, which are meant to spot patterns that are too complex for the human mind.

However, none of these techniques are foolproof, and many major institutions have backed away from this class of tools. OpenAI released its own AI detector to quell fears about its products in 2023 but pulled the tool off the market just months later “due to its low rate of accuracy.” The academic world was first to adopt AI detectors, but false accusations pushed a long list of universities to ban the use of AI detection software, including Vanderbilt, Michigan State, Northwestern, and the University of Texas at Austin.

AI detection companies “are in the business of selling snake oil,” said Debora Weber-Wulff, a professor at the University of Applied Sciences for Engineering and Economics in Berlin, who co-authored a recent paper about the effectiveness of AI detection. According to Weber-Wulff, research shows that AI detectors are inaccurate, unreliable, and easy to fool. “People want to believe that there can be some magic software that solves their problems,” she said. But “computer software cannot solve social problems. We have to find other solutions.”

The companies that make AI detectors say they’re a necessary but imperfect tool in a world inundated by robot-generated text. There’s a significant demand for these services, whether or not they’re effective.

Alex Cui, chief technology officer for the AI detection company GPTZero, said detectors have meaningful shortcomings, but the benefits outweigh the drawbacks. “We see a future where, if nothing is changed, the internet becomes more and more dictated by AI, whether it’s news, peer-reviewed articles, marketing. You don’t even know if the person you’re talking to on social media is real,” Cui said. “We need a solution for confirming knowledge en masse, and determining whether content is high quality, authentic, and of legitimate authorship.”

A necessary evil?

Mark, another Ohio-based copywriter who asked that we withhold his name to avoid professional repercussions, said he had to take work doing maintenance at a local store after an AI detector cost him his job.

“I got an email saying my most recent article had scored a 95% likelihood of AI generation,” Mark said. “I was in shock. It felt ridiculous that they’d accuse me after working together for three years, long before ChatGPT was available.”

He tried to push back. Mark sent his client a copy of the Google Doc where he drafted the article, which included timestamps that demonstrated he wrote the document by hand. It wasn’t enough. Mark’s relationship with the writing platform fell apart. He said losing the job cost him 90% of his income.

“We hear these stories more than we wish we did, and we understand the pain that false positives cause writers when the work they poured their heart and soul into gets falsely accused,” said Jonathan Gillham, CEO of Originality.AI. “We feel like we feel like we’re building a tool to help writers, but we know that at times it does have some consequences.”

But according to Gillham, the problem is about more than helping writers or providing accountability. “Google is aggressively going after AI spam,” he said. “We’ve heard from companies that had their entire site de-indexed by Google that said they didn’t even know their writers were using AI.”

It’s true that the internet is being flooded by low-effort content farms that pump out junky AI articles in an effort to game search results, get clicks, and make ad money from those eyeballs. Google is cracking down on these sites, which leads some companies to believe that their websites will be down-ranked if Google detects any AI writing whatsoever. That’s a problem for web-based businesses, and increasingly the No. 1 selling point for AI detectors. Originality promotes itself as a way to “future proof your site on Google” at the top of the list of benefits on its homepage.

A Google spokesperson said this completely misinterprets the company’s policies. Google, a company that provides AI, said it has no problem with AI content in and of itself. “It’s inaccurate to say Google penalizes websites simply because they may use some AI-generated content,” the spokesperson said. “As we’ve clearly stated, low value content that’s created at scale to manipulate Search rankings is spam, however it is produced. Our automated systems determine what appears in top search results based on signals that indicate if content is helpful and high quality.”

Mixed messages

No one claims AI detectors are perfect, including the companies that make them. But Originality and other AI detectors send mixed messages about how their tools should be used. For example, Gillham said “we advise against the tool being used within academia, and strongly recommend against being used for disciplinary action.” He explained the risk of false positives is too high for students, because they submit a small number of essays throughout a school year, but the volume of work produced by a professional writer means the algorithm has more chances to get it right. However, on one of the company’s blog posts, Originality says AI detection is “essential” in the classroom.

Then there are questions about how the results are presented. Many of the writers Gizmodo spoke to said their clients don’t understand the limitations of AI detectors or even what the results are actually saying. It’s easy to see how someone might be confused: I ran one of my own articles through Originality’s AI detector. The results were “70% Original” and “30% AI.” You might assume that means Originality determined that 30% of the article was written by a chatbot, especially because the tool highlights specific sentences it finds suspect. However, it’s actually a confidence score; Originality is 70% sure a human wrote the text. (I wrote the whole thing myself, but you’ll just have to take my word for it.)

Then there’s the way the company describes its algorithm. According to Originality, the latest version of its tool has a 98.8% accuracy rate, but Originality also says its false positive rate is 2.8%. If you’ve got your calculator handy, you’ll notice that adds up to more than 100%. Gillham said that’s because these numbers come from two different tests.

In Originality’s defense, the company provides a detailed explanation of how you should interpret the information right below the results, along with links to more detailed writeups about how to use the tool. It seems that isn’t enough, though. Gizmodo spoke to multiple writers who said they had to argue with clients who misunderstood the Originality tool.

Originality has published numerous blog posts and studies about accuracy and other issues, including the dataset and methodology it used to develop and measure its own tools. However, Weber-Wulff at the University of Applied Sciences for Engineering and Economics in Berlin said the details about Originality’s methodology “were not that clear.”

A number of experts Gizmodo spoke to, such as Juhasz of Undetectable AI, said they had concerns about businesses across the AI detection industry inflating their accuracy rates and misleading their customers. Representatives for GPTZero and Originality AI said their companies are committed to openness and transparency. Both companies said they go out of their way to provide clear information about the limitations and shortcomings of their tools.

It might feel like being against AI detectors is being on the side of writers, but according to Gillham the opposite is true. “If there are no detectors, then the competition for writing jobs increases and as a result the pay drops,” he said. “Detectors are the difference between a writer being able to do their work, submit content, and get compensated for it, and somebody being able to just copy and paste something from ChatGPT.”

On the other hand, all of the copywriters Gizmodo spoke to said the AI detectors are the problem.

“AI is the future. There’s nothing we can do to stop it, but in my opinion that’s not the issue. I can see lots of ways AI can be useful,” Mark said. “It’s these detectors. They are the ones that are saying with utmost certainty that they can detect AI writing, and they’re the ones who are making our clients on edge and paranoid and putting us out of jobs.”

This article has been updated to include comment from Grammarly’s Jenny Maxwell.