Anthropic: Claude 4 Opus Sets New Benchmarks
Claude 4 Opus outperformed all competing models on the MATH-500 and GPQA Diamond benchmarks, achieving 98.1% and 91.4% respectively.
Why it matters: The gap between frontier labs is narrowing in text tasks while widening in agentic capability — companies building on APIs need to reassess their model selection quarterly now.
Source: Anthropic Blog
OpenAI: o4-mini Released to API
OpenAI quietly shipped o4-mini to API customers overnight, offering o4-level reasoning at a third of the cost per token.
Why it matters: The price-performance inflection point has arrived for reasoning models — most production workloads can now afford to run chain-of-thought on every request.
Source: OpenAI Platform Changelog
EU AI Act: General Purpose AI Obligations Now Active
The EU AI Act’s GPAI provisions came into force today for any model with 10^25 FLOPs or more of training compute.
Why it matters: Every major frontier lab must now register with the EU AI Office, publish capability evaluations, and demonstrate systemic risk mitigation — or face fines up to 3% of global annual turnover.
Source: EU AI Office