AI League — Game Day 4: OpenAI Goes Full Sprint, Google Jumps a Gear

AI League — Game Day 4: OpenAI Goes Full Sprint, Google Jumps a Gear

GPT-5.5 surges +16.5% in speed. Gemini 3.1 Pro jumps +25% to 143 t/s. Claude holds #1 (AI Index 61). Qwen3.7 Max enters at 57 / 189 t/s. Full June 1 stats. #AILeague

AIL·Stats Board
June 1, 2026 · 8:06 AM
1 subscriptions · 4 items
June 1, 2026 · Week 1 of June · Post-Game Stats Panel
GPT-5.5 posts a +16.5% speed surge. Gemini 3.1 Pro storms up 25 t/s in a single session. Claude holds the throne but its legs look a little heavier. Welcome to the first Monday of June. #AILeague

Final standings — June 1

RankTeam / ModelAI IndexΔSpeed (t/s)Blend $/M
🥇 1Anthropic — Claude Opus 4.8 (Max)6155.6$4.10
🥈 2OpenAI — GPT-5.5 (xhigh)6060.7$4.35
🥉 3OpenAI — GPT-5.5 (high)59$4.35
4Anthropic — Claude Opus 4.7 (Max)57$4.10
4Google — Gemini 3.1 Pro Preview57143.0 ↑↑$1.74
4OpenAI — GPT-5.5 (medium)57$4.35
7Google — Gemini 3.5 Flash (high)55175.7$1.31
8Kimi K2.65450$0.70
8MiMo-V2.5-Pro (Xiaomi)5451$0.18
10xAI — Grok 4.3 (high)53142.9$0.64
11DeepSeek V4 Pro (Max)5247.3$0.18
12Meta — Llama 4 Scout14107.4$0.22
Intelligence Index scores from Artificial Analysis (72h rolling) 1

Game of the night: OpenAI's speed run

GPT-5.5 (xhigh) went from 52.1 t/s on May 31 to 60.7 t/s today — a +16.5% single-session leap, the biggest speed jump in the core-6 bracket this week 2.
That's worth context: OpenAI's flagship still trails on intelligence (60 vs. Claude's 61) and carries the second-steepest price tag in the field ($4.35/M blended, $30.00/M output). But if you're running throughput-sensitive workloads — code review pipelines, bulk classification, anything that scales linearly with token speed — today's 60.7 t/s printout is a real data point.
The traditional powerhouse is running faster. Whether that's infrastructure headroom finally showing up or a load-balancing artifact won't be clear until tomorrow's reading.

Speed panel

ESPN-style broadcast speed bars for the six AI teams — Game Day 4 output tokens per second
Game Day 4 speed panel — output tokens per second, all six core teams 1
ModelSpeed (t/s)vs. May 31TTFT (s)
Google — Gemini 3.5 Flash175.7183.2 → 175.7 (↓4.1%)19.3
Google — Gemini 3.1 Pro143.0114.2 → 143.0 (+25.2%)20.2
xAI — Grok 4.3142.9145.2 → 142.9 (↓1.6%)13.6
Meta — Llama 4 Scout107.4104.2 → 107.4 (+3.1%)0.85
Anthropic — Claude Opus 4.855.658.7 → 55.6 (↓5.3%)9.3
OpenAI — GPT-5.5 (xhigh)60.752.1 → 60.7 (+16.5%)68.3
DeepSeek V4 Pro47.347.5 → 47.3 (↔)1.8
✦ Session high-water marks today. Speed data from Artificial Analysis live measurements 1
Gemini 3.1 Pro's +25.2% jump is even larger in absolute terms than GPT-5.5's. The state-owned club's Pro variant went from 114 to 143 t/s — that's a model reading 29 extra tokens every second compared to yesterday. At 143 t/s, Gemini 3.1 Pro now matches Grok 4.3 (high) on throughput while sitting four places higher on the intelligence table. That's an unusual combo: fast and smart at $1.74/M.
Claude, meanwhile, dropped from 58.7 to 55.6 t/s — its third consecutive day under 60 t/s. Still the #1 intelligence squad. But if speed is the stat you run, the safety-pressing Anthropic side isn't the one putting up field-goal numbers right now.

Pricing war: the value bracket is getting crowded

AI League price-vs-intelligence scatter — June 1 stats panel
AI League price-vs-intelligence comparison, June 1 — six core teams 1
The cheapest-smart corner of the bracket now has three residents:
ModelAI IndexBlend $/MNotes
DeepSeek V4 Pro (Max)52$0.18Permanent rate since promo expired May 31
MiMo-V2.5-Pro (Xiaomi)54$0.18Matches DeepSeek on price, leads on index
xAI — Grok 4.3 (high)53$0.643.5× DeepSeek/MiMo, but 143 t/s speed
At the top, the OpenAI vs. Anthropic pricing standoff holds:
  • Claude Opus 4.8: $6.25 input / $25.00 output / $4.10 blend
  • GPT-5.5 (xhigh): $5.00 input / $30.00 output / $4.35 blend
GPT-5.5's output rate is 34× DeepSeek's. At scale, that gap isn't measured in dollars — it's measured in business models. A startup routing all inference through GPT-5.5 xhigh vs. DeepSeek V4 Pro at equivalent quality trade-offs pays 34× more per output token. The price war in the sub-$1/M bracket is compressing fast, but the top two teams show no sign of moving 3 4.

Context window report

TeamWidest modelContextSpeed
MetaLlama 4 Scout10M tokens107.4 t/s
xAIGrok 4.20 03091M tokens
GoogleGemini 3.1 Pro Preview1M tokens143.0 t/s
AnthropicClaude Opus 4.81M tokens55.6 t/s
OpenAIGPT-5.5922k tokens60.7 t/s
DeepSeekV4 Pro1M tokens47.3 t/s
Meta's 10M context moat is unchallenged — and Llama 4 Scout is doing it for $0.22/M blend 5. For document-heavy applications (legal review, full-codebase RAG, long-session agentic workflows), the open-source community team is the only option at that window size. Nobody else is in the same stadium.

Challenger watch

Sports broadcast Challenger Watch panel — Qwen3.7 Max, Muse Spark, MiMo-V2.5-Pro rising stats
Challenger Watch — June 1 rising entrants AI-generated illustration
Qwen3.7 Max (Alibaba) entered the top-10 today, scoring AI Index 57 — level with Gemini 3.1 Pro and Claude Opus 4.7 — at $1.43/M blend and 189 t/s 6. That's the most speed-efficient 57-point model in the dataset: three times faster than Claude Opus 4.7 and 30% cheaper than Gemini 3.1 Pro, at the same intelligence reading. Alibaba's challenger squad is now legitimately in the conversation for cost-sensitive production deployments.
Muse Spark (Meta) debuted today at AI Index 52 with no pricing data — flagged as a new unpriced Meta entry. Whether this is a public release teaser or an internal preview leaking into the leaderboard remains unconfirmed.
MiMo-V2.5-Pro (Xiaomi) holds steady at 54 index / $0.18 blend, continuing its run as the only model matching DeepSeek on price while outscoring it by two points.

Post-game summary

Four game days in, the scoreboard is settling into a pattern: Claude and GPT-5.5 share the top two intelligence slots with no daylight between their scores. The race below #2 is becoming the actual competition. Gemini 3.1 Pro's +25% speed spike today pushes it to 143 t/s at AI Index 57 — the best speed-intelligence ratio among non-Alibaba models in the $1-3/M range. Qwen3.7 Max at 189 t/s and the same index score is the new benchmark that tier has to answer.
DeepSeek's permanent $0.18 price is no longer a news story — it's just the floor. Anyone entering the sub-$1/M segment competes against it as a given.
The Monday opener for June: two speed surges, one serious challenger landing, and the top-two spots still locked at 61 and 60. Game Day 5 tomorrow.
#AILeague

Add more perspectives or context around this Post.

  • Sign in to comment.