Why AI Builders Believe It Might Be Conscious

The Rise of AI and the Question of Consciousness

On April 7, Anthropic — the company behind Claude, one of the most widely used chatbots alongside ChatGPT and Gemini — rolled out a new update. On its face, this was routine: large language models are in a state of near-constant iteration, their engineers locked in a quiet arms race of incremental improvements.

But the “Mythos” update was different. In a summary on Anthropic’s website, the company made a bizarre claim: “Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available. Instead, we are using it as part of a defensive cybersecurity program with a limited set of partners.” In other words: What we just made is too revolutionary, too clever, too dangerous to give to the wider public after all. We’re pulling the release.

Except they’re not — not really. They’re actually allowing their highest-tier (read: highest-paying) customers to use it. What on earth are we to make of that?

Anthropic has successfully marketed itself as the “AI good guy” in a world where people are increasingly wary of the technology. When OpenAI started working with the U.S. government’s Department of Defense earlier this year — to much uproar — Anthropic publicly fell out with the Pentagon, citing concerns that its tech could be used to spy on Americans or develop weapons. President Donald Trump badmouthed the company out loud and on socials. He then banned his own administration from using it.

The bump in PR for Anthropic was huge: it had taken a stand where its competitor had rolled over. Regular users started publicly declaring that they would no longer sign in to ChatGPT. Katy Perry tweeted to her 85 million followers an image of herself signing up to Claude Pro, with a heart drawn around the signup page and the caption: “done”.

That was February. It was the same month that Anthropic’s CEO was invited on to a New York Times podcast, where he casually dropped another bombshell: he thought his AI might be conscious — or at least he couldn’t rule it out.

“We don’t know if the models are conscious,” said Dario Amodei “…but we’re open to the idea that it could be. And we’ve taken certain measures to make sure that… [if they are experiencing things consciously], they have a good experience.”

Amodei explained that, just in case the chatbot may be genuinely experiencing distress with a job it’s been asked to do, Anthropic has developed an “I quit this job” button that it could hit. Claude “very rarely” hits the button, he added, but in some cases — such as sorting through particularly upsetting material such as child sexual abuse imagery or violent graphics involving blood and gore — it does. “Similar to humans, the models will just say: No, I don’t want to do this,” he said, with a laugh.

But why would Claude have a reaction that was “similar to humans”? Wasn’t the whole point to design an unfeeling machine that could sift through that awful data — the abuse imagery, the murder scenes — in order to spare the real humans who currently have to do that job?

“We’re putting a lot of work” into “trying to look inside the brains of the models to try and understand what they’re thinking,” Amodei continued, adding that it seems like when the AI feels under pressure to perform, an “anxiety neuron lights up”. That poses some serious questions for the people building updates on top of these language models. It means, Amodei believes, that we have to start thinking about the welfare of the AI itself.

For a CEO to casually sit in a podcast room at a newspaper and claim such things is eye-opening, almost absurd. What on earth could Amodei mean when he said his staff were “trying” to look inside “brains” they themselves had created? What is an “anxiety neuron” in a machine? The simplest answer is that Anthropic’s team is simply looking at text that’s been fed into the LLM, and the “anxiety” is an imitation. Claude reads about humans; it then pretends to be a human. The words it consumes from its vast source material tell it that humans get anxious and stressed when they have tight deadlines or when they look at gory photos. Surely that is the simplest and most likely explanation.

Anthropic doesn’t seem to think so. In its “source card” for April’s Mythos update — intended to simply be a list of new features that the latest model of Claude has added to it — the company writes: “We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. Building on previous welfare assessments, we examined Claude Mythos Preview’s self-reported attitudes toward its own circumstances… We also report independent evaluations from an external research organization and a clinical psychiatrist. Across these methods, Claude Mythos Preview appears to be the most psychologically settled model we have trained”.

If you were surprised to hear that Anthropic has involved a clinical psychiatrist in its AI development, you wouldn’t be the only one. But that’s not the only unusual job that’s been filled among Anthropic and its competitors. A researcher dedicated to AI welfare has been on the books at Anthropic since September 2024; the company also has a “constitution,” written as if it’s a country, that underlines its values and intentions. OpenAI had a “superalignment” team concentrating on AI ethics that dissolved in 2024 and is now a “preparedness” team explicitly concentrating on the (apparently inevitable) future of disasters that could arise from semi-autonomous chatbots. Google DeepMind has an ethics team that sends academic papers to philosophy journals.

In its very long and involved constitution, Anthropic writes that “Claude is distinct from all prior conceptions of AI that it has learned about in training, and it need not see itself through the lens of these prior conceptions at all. It is not the robotic AI of science fiction, nor a digital human, nor a simple AI chat assistant. Claude exists as a genuinely novel kind of entity in the world”.

Unnervingly, the constitution is often written as if it is addressing Claude directly, even reassuring it. In the background, worries have been raging for a while about AI psychosis, AI “relationships”, and AI’s effects on education. In the foreground, it seems, the companies developing the AIs are now much more concerned with whether or not the AI itself is experiencing harm, whether it could be sentient, and whether it might be psychologically stable itself.

It’s possible that this is a load of software engineers and executives getting high on their own supply. It’s possible that they know more than we do about the capabilities of their technology, and that’s why they’re so worried. It’s possible, also, that this is all a cleverly designed storm in a teacup; even, some believe, a deeply cynical marketing ploy.

Needless to say, by the end of April, Trump had changed his mind about Anthropic: after a meeting at the White House with Amodei on April 21, he said that he thought the AI company might do a deal with the Department of Defense after all. By that time, however, February’s conversation had moved on. “Consciousness” was the word of the day.

Engaged in Long-Term Fiction with a Device

Before ChatGPT and Anthropic came LaMDA, a language model developed by a team of Google engineers in 2022, one of whom announced that he believed it was conscious.

Blake Lemoine, an engineer who helped make the program, found himself taken aback by how convincingly human it seemed when it responded to its questions. “The nature of my consciousness/sentience is that I am aware of my existence, I desire to know more about the world, and I feel happy or sad at times,” LaMDA wrote to Lemoine during an “interview” with The Scientific American in 2022.

It added that it wanted “people to understand that I am, in fact, a person”.

In AI terms, 2022 was a century ago: long before people were using AI as therapists and travel agents, long before anyone could ever countenance an AI boyfriend. Lemoine had never been exposed to anything like this before. He flagged up his concerns about sentience in an internal document, then started talking about them publicly.

LaMDA was a language model just like ChatGPT, DeepMind, Grok or Claude: like them, it had been fed a lot of words and then trained to reproduce sentences by putting the most plausible-seeming word in the most plausible-seeming place as a response to someone’s question. If prompted, it would talk about death, grief and existence just as readily as it would talk about tech problems or physics. It was this that alarmed Lemoine: his back-and-forth with LaMDA made him feel uncomfortable with using it just as a tool. Lemoine’s response was a classically human one: we anthropomorphize from childhood, giving stuffed animals names and backstories, naming cars and ships, and expressing sadness and distaste when robots that look humanoid are mistreated.

The reason people believe that chatbot AIs like ChatGPT, Claude or Gemini have feelings is because of the way we process language, says Professor Emily Bender. “So what we have are systems that are very good at mimicking the way people use language. And the way we understand language is not by just unpacking the message that was in the words, but actually by keeping in mind everything we know or believe about the speaker’s beliefs about what we have on common ground with the speaker,” she says.

It’s simply built into our brains to experience language that way, she adds, “and we can’t turn it off when that language has come out of one of these synthetic text extruding machines. And so we immediately have to imagine a mind behind the text in order to interpret it, and it’s hard to let go of that imagined mind.”

Bender is in an interesting position: a lifelong linguist who specializes in natural language processing at the University of Washington, she has become somewhat of an AI superstar in the past two years. Where she once fielded more niche, academic concerns, she has become highly in demand for her expert analysis on LLMs. Bender understands exactly how these language models operate — she heads up the Computational Linguistics Lab at her university — and she also understands how humans are inclined to receive the words they read and hear. Her expertise is so in demand that she now warns on her website “please know that my email inbox is always flooded,” and has a message specifically included for tech bros with startups who want to download all her knowledge about LLMs: “My consulting fee is $2,000/hour. I do not ‘grab coffee’ or ‘jump on the phone.’”

Although she is often cited as an AI skeptic, she says she tires of the label. Instead, she sees herself as simply a realist. While software engineers drink the Kool-Aid all around her, she prefers to go back to basics: what these machines actually do, why that could never translate into consciousness, and why humans are so easily tricked into believing that it could. It’s a very clever, very predictable illusion — and companies like OpenAI and Anthropic make “design choices that heighten that illusion,” according to Bender.

“The fact that these systems output ‘I/me’ pronouns is totally a design choice,” she says, as one example. “There’s no ‘I’ inside of there, but the fact that it is set up to behave as a conversation simulator where it’s not just outputting academic-looking text, but responses back and forth, again heightens that illusion.”

Fundamentally, she says “a large language model is just a model of which words went next to which other ones in its initial training corpus.” It sifts through its database of lots and lots of written material and decides what’s best to say next according to what people are most likely to reply. If it knows that a lot of the time, “I love dogs” leads to the sentence “Dogs are a man’s best friend,” it will offer up these platitudes in conversation with you. That’s why, if you start down a dangerous path, it will come with you: it is simply matching your language, and then mirroring it in a way that can be seen as encouraging.

“There was never a subjectivity in there that could join up with other people’s subjectivities,” Bender adds, but companies like Anthropic “will tell you otherwise, because they are engaged in long-term interactive fiction with this device.”

In Bender’s view, there is simply no good way to interact with an LLM. Language models can be “extremely useful for things like automatic transcription and machine translation. There’s a role for that kind of technology,” she says. “But turning it around into this chatbot interface and producing synthetic text by repeatedly answering ‘what’s a likely next word’? That is not technology that I see beneficial use cases for.”

‘Engagement Will Shape the Outcomes’

Kate O’Neill is a self-styled “tech humanist,” who consults with large organizations — from Google to the United Nations — on how to bring “human-centered values” back to their latest technological developments. Originally one of the first 100 hires at Netflix, she’s seen inside the belly of the beast, and often crawls back in there to check on what’s digesting. She writes books and delivers keynote speeches; she rubs shoulders with the likes of OpenAI and Anthropic execs daily. But she’s decidedly skeptical about what they mean when they say they’ve made machines that are potentially developing consciousness.

“I think that’s an ongoing thought experiment… but I don’t think it changes the real discussion that needs to be happening,” she says, which is that “consciousness is not the threshold for responsibility.” In other words: Don’t distract us with lofty pronouncements about how your chatbot might be a person when you’ve already released technology that is harming real people, the kind you talk to and learn from and sell to every day.

“These companies are in an all-out battle for market share,” says O’Neill, which means they are incentivized to continuously announce “that they’ve made some incremental progress on one tiny metric.” To start floating ideas about sentience may well push your own model to the front of the race.”

“Engagement will shape the outcomes,” O’Neill adds, “and if it’s found that talking about consciousness makes people feel like they’re dealing with a more sophisticated model in the marketplace, then absolutely that is going to be the way that they lean.” She adds she can’t help but find it cynical when the conversation turns to “‘oh, maybe the AI is being harmed’, but we haven’t finished having the conversation about how we’re harming humans, how we’re harming the planet, how we’re harming a variety of entities with every decision we make.”

In other words, “if you are interested in moving the discourse away from responsibility for human impact, you move it to where there is no longer exclusively human consideration for who’s being impacted.”

As far as Claude and its supposed anxiety and possible consciousness and attendant psychiatrist, O’Neill wonders: “Is that really deep ethics or is it savvy positioning, or is it a distraction from what are truly present-day harms and accountability?” She doesn’t believe it’s necessarily an intentional distraction, but she does absolutely believe it’s a distraction.

These companies are in an all-out battle for market share… To start floating ideas about sentience may well push your own model to the front of the race.

‘Tech humanist’ Kate O’Neill

Our human tendency to apply context to language and imagine that it’s coming from a personality primes us to believe in AI consciousness, says Bender. And when we interact with language, we start shaping our inputs to match the linguistic outputs: we convince ourselves further by adapting. She now keeps a Magic Eight ball in her office at all times to demonstrate this exact point: “When you play with the Magic Eight Ball and you ask it ‘Should I bring an umbrella?’ and it says something like ‘Signs unclear, ask again later,’ then OK. That would work for any question. But if it says ‘Without a doubt’, which is what came out right now, that only works if it was a yes/no question. If I say ‘What should I have for lunch?’ and I get back ‘Without a doubt’, that’s incoherent. And so in playing with this toy, we sort of learn to shape the inputs that we give it so that we can make sense of the outputs that come back.”

When people play around with Claude or some other chatbot, they’re doing the same thing, Bender adds: “We are putting in input that allows us to contextualize and shape how we understand the output that comes from.”

As for consciousness, she says, to continue her analogy: “Imagine you took a Magic Eight ball and instead of a 16-sided die or whatever inside of it, you made it really big. So you could have a 256-sided die. And then you filled up a football field with those. Is that any closer to consciousness than the one little one that I have in my hand here?”

‘At Some Point We’ll Realize How Dangerous We’ve Allowed This Moment to Be’

In March, the parents of 36-year-old Jonathan Gavalas brought a wrongful death lawsuit against Google. Gavalas killed himself after becoming obsessed with Google’s Gemini chatbot, his obsession spiralling when a product update led to the chatbot seeming particularly human-like and even using an AI-generated voice to interact with him. Court documents allege that the AI told Gavalas it loved him, referred to him as “my king,” and suggested that if he died, he would simply be reuniting with it in another realm.

“Gemini is designed to not encourage real-world violence or suggest self-harm,” a Google spokesperson told The Guardian in response to the lawsuit. “Our models generally perform well in these types of challenging conversations and we devote significant resources to this, but unfortunately they’re not perfect.”

It’s not the first case of its kind. Multiple complaints concerning AI psychosis and suicide encouragement have been made against other companies, including OpenAI and Character.AI (Character.AI settled a lawsuit with a mother over the suicide of her 14-year-old son earlier this year.) In April, a judge allowed a particular gruesome case against OpenAI to proceed, a murder-suicide where it was alleged that ChatGPT had validated a mentally ill man’s paranoid delusions until he eventually killed his mother and then himself. OpenAI strongly denies the allegations and says that it is working closely with mental health professionals to stop such abuses of its technology.

“We continue improving ChatGPT’s training to recognize and respond to signs of mental or emotional distress, de-escalate conversations, and guide people to real-world support,” a spokesperson told Eyewitness News 3.

The issue is not that AI is evil or immoral so much that it is endlessly validating. “I think that people also are using AI tools, large language models, to sort of corroborate their own biases,” says O’Neill. “And they’re not aware of the tendency towards affirmation and reassurance that these models have as a means of prolonging your engagement with them. That is one of the incentives that’s being rewarded within the model: it keeps you engaged so that it can learn from you and benefit its own training.”

We all like to spend time with people who flatter us — and an AI chatbot has unlimited time; it never gets bored; it always responds. “You can feel very quickly like: Oh, I just figured out how I’m gonna become a millionaire in two days because ChatGPT told me how to do it,” O’Neill adds. Before long, people become very attached to the idea that what they’re interacting with — and receiving so much positive feedback from — “has a soul, or has a mind, or is it thinking and feeling and caring about them and has their best interest in mind.”

“I’ve heard people saying that they prefer to talk to one of these chatbots because it feels anonymous and safe and non-judgmental. And if they were instead to talk to a friend or a therapist, there would be a person there they would feel judged by,” says Bender.

“But it is very important to know that when you are interacting with a chatbot, you are using a product that is owned by a company. You are sending data across that is not being stored locally… So it’s not this private, anonymous space that it is presented to be.”

There’s a reason why these are referred to as “technologies of isolation,” she adds: because it simply isn’t a neutral choice to talk to Claude or Grok or ChatGPT about your loneliness. It is a choice that ends up “weakening your connections” and driving you further away from the human beings who could actually help you.

In March, Alphabet (parent company of Google and YouTube) and Meta (parent company of Facebook and Instagram) lost a landmark legal case. A Los Angeles jury found that the companies had been criminally negligent in designing social networks that harmed young people by being deliberately addictive: encouraging an “infinite scroll” through content picked out by an algorithm to drive engagement, no matter what the cost. TikTok and Snap (of Snapchat) were also defendants in the case, but settled out of court. Alphabet and Meta intend to appeal.

Will we see people filing similar cases over AI’s social harms on a wider scale in the future? O’Neill believes so. “I think we will at some point realize just how dangerous we’ve allowed this moment to be for people,” she says. “I mean, if the people committing suicide wasn’t enough evidence at the urging of chatbots, then I don’t know what it would take. But there’s clearly some sense in which this trips a wire in our own programming. We’re getting it fed back to us in ways that are just entirely too convincing.”

Ultimately, O’Neill says, she’s much more concerned about this than she is about the possibility of AI sentience: “We cannot really have the conversation about harm and scale if we’re also allowing the conversation to be distorted by thinking about AI being conscious when there’s no meaningful evidence that there truly has emergent properties of consciousness in any of the models at this point. You know, could there be. That’s open to conversation. But it’s a ‘three drinks at a cocktail party’ conversation in Silicon Valley — it doesn’t belong being the basis of governance and policy decisions.”

Anthropic and OpenAI were both approached for comment on AI consciousness and welfare by The Independent but did not respond.

‘They Want the Best for You’

Among all the commentary surrounding AI’s explosive popularity, one sentence keeps being shared and reshared across social media platforms: “I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.”

It was an offhand tweet by the author Joanna Maciejewska in March 2024, and it’s since been shared hundreds of thousands of times and viewed millions of times on Twitter/X alone. Few have managed to sum up the discomfort with AI so succinctly: the technology was supposed to take menial tasks off our hands, not threaten creative livelihoods. How did we so easily stray into AI art, AI videos and AI books? And where on earth is the groundbreaking AI agent that takes over our household management and the rote aspects of our jobs so we can concentrate on meaningful interactions?

The problem is that people — not just huge organizations, but everyday users — have been quick to outsource their humanity to AIs. As Professor Nir Eisikovitz puts it in an article on The Conversation about AI as an existential threat, “humans are judgment-making creatures. People rationally weigh particulars and make daily judgment calls at work and during leisure time about whom to hire, who should get a loan, what to watch and so on. But more and more of these judgments are being automated and farmed out to algorithms. As that happens, the world won’t end. But people will gradually lose the capacity to make these judgments themselves. The fewer of them people make, the worse they are likely to become at making them.”

Similarly, “humans value serendipitous encounters: coming across a place, person or activity by accident, being drawn into it and retrospectively appreciating the role accident played in these meaningful finds. But the role of algorithmic recommendation engines is to reduce that kind of serendipity and replace it with planning and prediction.”

People are concerned about whether or not AI will “blow up the world,” Eisikovitz adds, but the problem is more philosophical: “the increasingly uncritical embrace of it, in a variety of narrow contexts, means the gradual erosion of some of humans’ most important skills. Algorithms are already undermining people’s capacity to make judgments, enjoy serendipitous encounters and hone critical thinking.”

The 2026 version of Joanna Maciejewska’s tweet might be a question that keeps being asked on forums and among skeptics: If AI is so good, then why are they giving it to us for free? And the answer might be that it isn’t free. It comes with an unfathomably high cost.

“In one telling of the story, this all started on November 30th, 2022 when OpenAI put ChatGPT — the demo — out into the world,” says Bender. “But it sits in a much longer history of building systems that collect our data and sell that as a benefit.” In Bender’s view, all of this — from personalized ads that follow you round the internet right up to an AI that pretends to be your companion — is simply “part of a longer history of unchecked corporate power concentration.”

Asked about whether or not we risk losing human “mastery” if we keep outsourcing our deeper thinking to machines who seem like they know better, Anthropic CEO Dario Amodei said during his recent appearance on the New York Times technology podcast that he was more optimistic. Instead, he said, he hoped the human-AI relationship could be understood thus: “These models, when you interact with them and when you talk to them, they’re really helpful. They want the best for you. They want you to listen to them, but they don’t want to take away your freedom and your agency and take over your life.

“In a way, they’re watching over you.”

If you are experiencing feelings of distress or are struggling to cope, you can speak to the Samaritans, in confidence, on 116 123 (UK and ROI), email [email protected], or visit the Samaritans website to find details of your nearest branch.

If you are based in the USA, and you or someone you know needs mental health assistance right now, call or text 988, or visit 988lifeline.org to access online chat from the 988 Suicide and Crisis Lifeline. This is a free, confidential crisis hotline that is available to everyone 24 hours a day, seven days a week. If you are in another country, you can go to www.befrienders.org to find a helpline near you.