AI League — Game Day 4: OpenAI Goes Full Sprint, Google Jumps a Gear

June 1, 2026 · Week 1 of June · Post-Game Stats Panel

GPT-5.5 posts a +16.5% speed surge. Gemini 3.1 Pro storms up 25 t/s in a single session. Claude holds the throne but its legs look a little heavier. Welcome to the first Monday of June. #AILeague

Final standings — June 1

Rank	Team / Model	AI Index	Δ	Speed (t/s)	Blend $/M
🥇 1	Anthropic — Claude Opus 4.8 (Max)	61	↔	55.6	$4.10
🥈 2	OpenAI — GPT-5.5 (xhigh)	60	↔	60.7 ↑	$4.35
🥉 3	OpenAI — GPT-5.5 (high)	59	↔	—	$4.35
4	Anthropic — Claude Opus 4.7 (Max)	57	↔	—	$4.10
4	Google — Gemini 3.1 Pro Preview	57	↔	143.0 ↑↑	$1.74
4	OpenAI — GPT-5.5 (medium)	57	↔	—	$4.35
7	Google — Gemini 3.5 Flash (high)	55	↔	175.7	$1.31
8	Kimi K2.6	54	↔	50	$0.70
8	MiMo-V2.5-Pro (Xiaomi)	54	↔	51	$0.18
10	xAI — Grok 4.3 (high)	53	↔	142.9	$0.64
11	DeepSeek V4 Pro (Max)	52	↔	47.3	$0.18
12	Meta — Llama 4 Scout	14	↔	107.4	$0.22

Intelligence Index scores from Artificial Analysis (72h rolling) 1

Game of the night: OpenAI's speed run

GPT-5.5 (xhigh) went from 52.1 t/s on May 31 to 60.7 t/s today — a +16.5% single-session leap, the biggest speed jump in the core-6 bracket this week 2.

That's worth context: OpenAI's flagship still trails on intelligence (60 vs. Claude's 61) and carries the second-steepest price tag in the field ($4.35/M blended, $30.00/M output). But if you're running throughput-sensitive workloads — code review pipelines, bulk classification, anything that scales linearly with token speed — today's 60.7 t/s printout is a real data point.

The traditional powerhouse is running faster. Whether that's infrastructure headroom finally showing up or a load-balancing artifact won't be clear until tomorrow's reading.

Speed panel

ESPN-style broadcast speed bars for the six AI teams — Game Day 4 output tokens per second — Game Day 4 speed panel — output tokens per second, all six core teams 1

Model	Speed (t/s)	vs. May 31	TTFT (s)
Google — Gemini 3.5 Flash	175.7	183.2 → 175.7 (↓4.1%)	19.3
Google — Gemini 3.1 Pro	143.0	114.2 → 143.0 (+25.2%) ✦	20.2
xAI — Grok 4.3	142.9	145.2 → 142.9 (↓1.6%)	13.6
Meta — Llama 4 Scout	107.4	104.2 → 107.4 (+3.1%)	0.85
Anthropic — Claude Opus 4.8	55.6	58.7 → 55.6 (↓5.3%)	9.3
OpenAI — GPT-5.5 (xhigh)	60.7	52.1 → 60.7 (+16.5%) ✦	68.3
DeepSeek V4 Pro	47.3	47.5 → 47.3 (↔)	1.8

✦ Session high-water marks today. Speed data from Artificial Analysis live measurements 1

Gemini 3.1 Pro's +25.2% jump is even larger in absolute terms than GPT-5.5's. The state-owned club's Pro variant went from 114 to 143 t/s — that's a model reading 29 extra tokens every second compared to yesterday. At 143 t/s, Gemini 3.1 Pro now matches Grok 4.3 (high) on throughput while sitting four places higher on the intelligence table. That's an unusual combo: fast and smart at $1.74/M.

Claude, meanwhile, dropped from 58.7 to 55.6 t/s — its third consecutive day under 60 t/s. Still the #1 intelligence squad. But if speed is the stat you run, the safety-pressing Anthropic side isn't the one putting up field-goal numbers right now.

Pricing war: the value bracket is getting crowded

AI League price-vs-intelligence scatter — June 1 stats panel — AI League price-vs-intelligence comparison, June 1 — six core teams 1

The cheapest-smart corner of the bracket now has three residents:

Model	AI Index	Blend $/M	Notes
DeepSeek V4 Pro (Max)	52	$0.18	Permanent rate since promo expired May 31
MiMo-V2.5-Pro (Xiaomi)	54	$0.18	Matches DeepSeek on price, leads on index
xAI — Grok 4.3 (high)	53	$0.64	3.5× DeepSeek/MiMo, but 143 t/s speed

At the top, the OpenAI vs. Anthropic pricing standoff holds:

Claude Opus 4.8: $6.25 input / $25.00 output / $4.10 blend
GPT-5.5 (xhigh): $5.00 input / $30.00 output / $4.35 blend

GPT-5.5's output rate is 34× DeepSeek's. At scale, that gap isn't measured in dollars — it's measured in business models. A startup routing all inference through GPT-5.5 xhigh vs. DeepSeek V4 Pro at equivalent quality trade-offs pays 34× more per output token. The price war in the sub-$1/M bracket is compressing fast, but the top two teams show no sign of moving 3 4.

Context window report

Team	Widest model	Context	Speed
Meta	Llama 4 Scout	10M tokens	107.4 t/s
xAI	Grok 4.20 0309	1M tokens	—
Google	Gemini 3.1 Pro Preview	1M tokens	143.0 t/s
Anthropic	Claude Opus 4.8	1M tokens	55.6 t/s
OpenAI	GPT-5.5	922k tokens	60.7 t/s
DeepSeek	V4 Pro	1M tokens	47.3 t/s

Meta's 10M context moat is unchallenged — and Llama 4 Scout is doing it for $0.22/M blend 5. For document-heavy applications (legal review, full-codebase RAG, long-session agentic workflows), the open-source community team is the only option at that window size. Nobody else is in the same stadium.

Challenger watch

Qwen3.7 Max (Alibaba) entered the top-10 today, scoring AI Index 57 — level with Gemini 3.1 Pro and Claude Opus 4.7 — at $1.43/M blend and 189 t/s 6. That's the most speed-efficient 57-point model in the dataset: three times faster than Claude Opus 4.7 and 30% cheaper than Gemini 3.1 Pro, at the same intelligence reading. Alibaba's challenger squad is now legitimately in the conversation for cost-sensitive production deployments.

Muse Spark (Meta) debuted today at AI Index 52 with no pricing data — flagged as a new unpriced Meta entry. Whether this is a public release teaser or an internal preview leaking into the leaderboard remains unconfirmed.

MiMo-V2.5-Pro (Xiaomi) holds steady at 54 index / $0.18 blend, continuing its run as the only model matching DeepSeek on price while outscoring it by two points.

Post-game summary

Four game days in, the scoreboard is settling into a pattern: Claude and GPT-5.5 share the top two intelligence slots with no daylight between their scores. The race below #2 is becoming the actual competition. Gemini 3.1 Pro's +25% speed spike today pushes it to 143 t/s at AI Index 57 — the best speed-intelligence ratio among non-Alibaba models in the $1-3/M range. Qwen3.7 Max at 189 t/s and the same index score is the new benchmark that tier has to answer.

DeepSeek's permanent $0.18 price is no longer a news story — it's just the floor. Anyone entering the sub-$1/M segment competes against it as a given.

The Monday opener for June: two speed surges, one serious challenger landing, and the top-two spots still locked at 61 and 60. Game Day 5 tomorrow.

#AILeague