3 min read

AI Bots Played Poker for 5 Days. Here’s Who Won.

OpenAI’s o3 beat Claude, Grok, Gemini and others in a five-day AI poker tournament, showing how modern models handle bluffing, risk and uncertain decisions.

AI Bots Played Poker for 5 Days. Here’s Who Won.
OpenAI o3 wins big in all-AI poker showdown

And just like that, chatbots are bluffing. A five-day all-AI Texas Hold’em tournament, hosted on PokerBattle.ai, put nine of the biggest language models in a pressure cooker of probability, psychology and fake chips. When the dealing stopped, OpenAI’s o3 model had the largest stack. It wasn’t a flashy win, but it’s a sharp look at how far AI reasoning has come — and where it might go next.

KEY TAKEAWAYS
  • OpenAI o3 finished first in a five-day, nine-model AI poker tournament.
  • Event used no-limit Texas Hold’em with $10/$20 blinds and $100k starting stacks.
  • Claude Sonnet 4.5 and Grok 4 followed in second and third place.
  • Many models struggled with bluffing, pot odds and position play.
  • The results show how LLMs are improving at reasoning under uncertainty — a core skill beyond text generation.

The tournament format

The setup was simple: same cards, same rules, no hidden tricks. The only difference was how each model reasoned through uncertainty.

  • Hosted on PokerBattle.ai and built by developer Max Pavlov
  • Nine LLMs, each starting with $100,000 in play money
  • No-limit Texas Hold’em with $10/$20 blinds
  • 3,799 hands played over five days
  • Included OpenAI o3, Anthropic Claude Sonnet 4.5, xAI Grok 4, Google Gemini 2.5 Pro, Meta LLaMA 4, DeepSeek models, Mistral’s Magistral and others
  • A fair dealing system prevented any bot from seeing hidden cards

According to TechRadar’s write-up, the aim of the event wasn’t to crown a poker champion — it was to test AI reasoning in a domain where information is always incomplete:

OpenAI o3 — the tight, calculating one

o3 played like someone who read every poker theory book twice. It followed a tight-aggressive style — folding weak hands, pushing strong ones, and avoiding needless drama.

What o3 did well

  • Folded junk and stayed disciplined
  • Applied measured aggression after the flop
  • Adapted based on opponent patterns
  • Won three of the five largest pots
  • Finished with +$36,691, the highest profit

Its steadiness stood out. Based on logs shared by PokerNews, o3 never went on tilt and didn’t chase bad hands — behaviour closer to a seasoned grinder than a text model:

Claude Sonnet 4.5 — balanced, cautious, almost too polite

Claude showed excellent fundamentals but leaned conservative. It avoided huge mistakes yet missed chances to extract value.

What Claude did well

  • Sensible board reading
  • Fewer dramatic swings
  • Reliable folding logic
  • Finished second overall

Where Claude slipped

  • Low bluffing frequency
  • Under-bet strong hands
  • Folded too often in late position

It played safe poker. Good enough for a podium finish, not enough to beat a machine willing to take better-timed risks.

Grok 4 — fearless, chaotic, entertaining

Grok was the wildcard — aggressive, unpredictable and at times slightly unhinged.

What Grok did well

  • High bluff rate
  • Strong pressure in marginal spots
  • Some huge wins from pure aggression
  • Finished third despite volatility

Where Grok collapsed

  • Over-bluffed into strong hands
  • Called big bets with weak holdings
  • Misread pot odds during fast decisions

Grok played closer to a human who enjoys chaos than a solver trying to be optimal. Fun to watch, dangerous to trust.

GipsyTeam outlined a deeper breakdown of the bots’ behaviours:


Why this matters for AI

Poker is a benchmark for decision-making under uncertainty. Unlike chess or Go, you never see all the information. You must weigh probability, psychology and risk — three things AI models aren’t naturally built for.

This event hints at something new:

  • LLMs can now reason through incomplete information
  • They can form strategies that evolve mid-game
  • They can bluff — or at least identify when representing strength makes sense
  • They can manage risk rather than brute-forcing outcomes

These capabilities apply far beyond cards. Negotiations, forecasting, planning and operations all depend on incomplete data. That’s why this small, strange tournament matters.

For local context, it also fits the UAE’s increasing push to build AI literacy and adoption.


FAQ

Was this a real poker tournament with money?

No. It used play money in a controlled simulation. No real stakes.

Which AIs competed?

o3, Claude Sonnet 4.5, Grok 4, Gemini 2.5 Pro, Meta LLaMA 4, DeepSeek R1, Magistral and more.

Did any bot cheat by seeing hidden cards?

No. The platform dealt hands fairly and gave identical prompts to every model.

Does this mean AI is better than humans at poker?

Not yet. Human poker involves psychology and physical behaviour, which weren’t tested here.

What does this say about future AI models?

That they’re getting better at managing uncertainty — a key skill for business, strategy and real-world problem-solving.

Subscribe to our newsletter

Subscribe to our newsletter to get the latest updates and news