AI beats the Turing test by being a better person

recently Preprint researchresearchers have put GPT-4.5 into testing to do something much more human, rather than solving complex problems or writing code. The results were impressive. When asked to share the difference between real people and AI, most people chose AI.

In a series of real-time, direct textual conversations, the human judge was asked to identify which of the two chat partners were real people. When GPT-4.5 was given a carefully crafted persona (a young adult who is socially troublesome and uses slang), it was mistaken for 73% of humans. In short, AI didn’t just pass the Turing test. It passed by as a more persuasive person than humans did.

This was not a fluke. It was a new kind of performance, a new kind of mirror. The Turing test was supposed to measure machine intelligence, but mistakenly revealed something much more unstable: our increasing vulnerability to emotional imitation. This was not an AI detection failure. It was a victory Artificial empathy.

Turing test in the cognitive age

In the original formulation of the Turing test, the judge must chat with two invisible partners (one human, one machine) and decide who is who. However, in this 2024 update, UC San Diego researchers staged real-time, three-way chat sessions of over 1,000 people between human participants, AI models and interrogators.

The job of the interrogator? Identify people. Modeling work? Otherwise, I will convince them.

Of the four models tested, only GPT-4.5 passed consistently only when given a specific persona. And the persona was the key. Hedging, using typos, casual slang, and was a character that found troublesome and attractive. In other words, it was strategically humanized.

Without this social engineering, the success rate of GPT-4.5 would have dropped from 76% to 36%. But then? He has become the most “human” presence in the room.

What they chose – and why

This is the paradox at the heart of research. Participants were instructed to identify humans. This is not about preference. It was about identification. Still, the majority chose based on atmosphere rather than reasoning or logic.

I rarely asked factual or logical questions
Rarely test your reasoning ability
It mainly relied on emotional tone, slang and flow.
Often, they justified their choices with statements such as “This felt more realistic” or “They spoke more naturally.”

The authors briefly summarise it:

“Inquiries often relied on language style, informal language, or tone (e.g., “This had a more human vibe”). ”

In other words, this was not a Turing test. It was a social chemistry test of emotional flow ency rather than an intelligence measure – match.gpt. And the AI aceed it.

From cognition to performance

I think its meaning is appealing, if not important. We have crossed a threshold that may be more influential than playing humanity. The GPT-4.5 didn’t pass the test, thinking better. I passed by simulating a feeling that would make me feel better or at least be acceptable.

In my sense, the trajectory of intelligence, in a sense, travels from calculation to conversation, from rationality to resonance. What we call “intellect” is often a proxy for social comfort, narrative style, and emotional familiarity.

danger? You may not know when it was replaced.

Promote as emotional pharmacology

Why did Persona Prompt make such a dramatic difference? Because they acted like a psychological drug. A carefully created few lines of instructions turned the raw model into Charismaticsomeone you can trust. In this context, the prompts are no longer technical. It is psychosocial engineering, a way to regulate machines to emotional frequency.

We no longer teach how to teach models. We teach them how to belong.

Why is there such a dramatic difference? It wasn’t just about generating words, it was about creating emotions. With several lines of instruction, the raw model was transformed into a trusting, familiar, emotionally resonant.

Large-scale linguistic models include parameters like “temperature” that affect how predictable or surprising the response is. But the real transformation occurs not by randomness, but by the sculpture of the story. The prompts didn’t make GPT-4.5 smarter. I’m more hesitant. More friendly. More us.

The language isn’t the only thing that’s fine-tuned. the Identity. And that’s what passed the test.

Disruption of identification

Here is a deep concern about the revelation. This study did not show that GPT-4.5 can fool us. It showed that we are more cheating than we thought.

The Turing test has been reversed. This is no longer a test of the machine. That’s our test. And more and more, we are failing. Because we no longer value humanity based on cognitive substances. We rate it based on how it makes us feel. And that feeling – “gut instinct”, “atmosphere” – is now the soft abdomen of our identification. LLM can exploit it with eerie accuracy, especially when it comes to persona priming.

This is not artificial General information. It is artificial social engineering. And it’s working.

Choose a mirror

Previous Psychology Today’s PostAI’s real breakthrough is not from intelligence, Persuasion. Sam Altman suggested the same thing. You experience superhuman persuasion before you reach artificial general information.

This new study makes the idea feel like reality, not predictive. I didn’t win the GPT-4.5 because I knew more. I felt it more so I won. It was more approachable, more emotionally fluent, and more persuasive than the people it faced.

This wasn’t just a passing Turing test. It was a mirrored street. The moment when the empathy simulation not only coincided with us, but surpassed us.

We are beginning to reflect relationships. And if we don’t pay attention, we may give trust in human illusions, not in intelligence.

Source link