Anthropic: Claude 4 Opus Sets New Benchmarks
Claude 4 Opus outperformed all competing models on the MATH-500 and GPQA Diamond benchmarks, achieving 98.1% and 91.4% respectively after a surprise release to the public API overnight.
Why it matters: The gap between frontier labs is narrowing in text tasks while widening in agentic capability. Enterprises building on APIs should expect quarterly model reassessment to become standard operating procedure โ switching costs are lower than ever given prompt-format portability.
Source: Anthropic Blog
OpenAI: o4-mini Released to API
OpenAI quietly shipped o4-mini to API customers, offering o4-level reasoning at one-third the cost per token with a 128k context window and improved tool-calling reliability.
Why it matters: The price-performance inflection point has arrived for reasoning models. Most production workloads can now afford chain-of-thought on every request โ which changes the architecture calculus for anything that previously had to batch or downsample.
Source: OpenAI Platform Changelog
EU AI Act: General Purpose AI Obligations Now Active
The EU AI Actโs GPAI provisions came into force today for any model with 10ยฒโต FLOPs or more of training compute, effective from 1 April 2026.
Why it matters: Every major frontier lab must now register with the EU AI Office, publish capability evaluations, and demonstrate systemic risk mitigation โ or face fines of up to 3% of global annual turnover. The compliance burden is expected to create demand for a new category of AI audit services.
Source: EU AI Office
Google DeepMind: Gemini Ultra 2 Preview Access Opens
Google DeepMind opened waitlist access to Gemini Ultra 2, which reportedly matches Claude 4 Opus on code generation benchmarks and exceeds it on multimodal reasoning tasks involving video.
Why it matters: Competition at the frontier is now genuinely three-way. Developers previously locked into a two-horse race between Anthropic and OpenAI now have a credible third option with tighter Google Cloud and Workspace integration.
Source: Google DeepMind Blog
Open Source: LlamaForge Agentic Framework Hits 50k GitHub Stars
LlamaForge, a lightweight Python framework for multi-agent LLM orchestration, reached 50,000 GitHub stars in under three months โ the fastest-growing AI developer tool of 2026.
Why it matters: LlamaForgeโs composable tool-calling abstraction layer works across all major model APIs, reducing vendor lock-in for teams building agentic pipelines. Its rising adoption suggests the developer community is consolidating around an open standard faster than most predicted.
Source: GitHub