Gemini (1)

A recent report by Google DeepMind reveals that its flagship model, Gemini 2.5 Pro, displayed signs of panic while playing Pokémon Blue—an old-school video game many children breeze through with ease.

The findings came from a Twitch channel called Gemini_Plays_Pokemon, where independent engineer Joel Zhang put Gemini to the test. While Gemini is known for its advanced reasoning abilities and code-level understanding, its performance during this gaming challenge exposed unexpected behavioural quirks.

Gemini ‘panicked’

According to the DeepMind team, Gemini began to exhibit what they describe as “Agent Panic.” The report states, “Over the course of the playthrough Gemini 2.5 Pro gets into various situations which cause the model to simulate ‘panic’. For example, when the Pokémon in the party’s health or power points are low, the model’s thoughts repeatedly reiterate the need to heal the party immediately or escape the current dungeon.”

This behaviour didn’t go unnoticed. Viewers on Twitch began identifying when the AI was panicking, with DeepMind noting, “This behaviour has occurred in enough separate instances that the members of the Twitch chat have actively noticed when it is occurring.”

Although AI doesn’t experience stress or emotion like humans, the model’s erratic decision-making in high-pressure situations mirrors how people behave under stress, making impulsive or inefficient choices.

In the first full game run, Gemini took 813 hours to finish Pokémon Blue. After adjustments by Zhang, the AI completed a second playthrough in 406.5 hours. Still, this was far from efficient, especially compared to the time a child would take to complete the same game.

Social media users were quick to mock the AI’s anxious gameplay. “If you read it’s thoughts when reasoning it seems to panic just about any time you word something slightly off,” said one viewer. Another joked: “LLANXIETY.”

A third chimed in with a broader reflection: “I’m starting to think the ‘Pokémon index’ might be one of our best indicators of AGI. Our best AIs still struggling with a child’s game is one of the best indicators we have of how far we still have yet to go. And how far we’ve come.”
Interestingly, these revelations come just weeks after Apple released a study arguing that most AI reasoning models don’t truly reason at all. Instead, they rely heavily on pattern recognition and tend to fall apart when the task is tweaked or made more complex.