Claude Opus 4.6: What Actually Matters for Developers

Anthropic released Claude Opus 4.6 yesterday, and it’s a significant upgrade over its predecessor. I’ve been using it since launch, and here’s what stands out from a practical standpoint.

The coding improvements are real

Opus 4.6 plans more carefully, stays on task longer during agentic sessions, and handles larger codebases more reliably. It also catches its own mistakes better during code review and debugging. If you’ve ever watched Claude lose the thread halfway through a complex refactor, that’s the specific problem this addresses.

The benchmark numbers back this up. It tops Terminal-Bench 2.0 for agentic coding tasks and outperforms every other frontier model on GDPval-AA, which tests real-world knowledge work across finance, legal, and other professional domains. It beat OpenAI’s GPT-5.2 by about 144 Elo points on that eval, which translates to winning roughly 70% of head-to-head comparisons.

1M token context window (finally)

This is a first for Opus-class models. Previous versions topped out at 200k tokens. The practical difference: you can feed it an entire codebase, a full project’s documentation, or months of conversation history without it losing track of details buried deep in the context.

On a needle-in-a-haystack retrieval benchmark (MRCR v2), Opus 4.6 scored 76% compared to Sonnet 4.5’s 18.5%. That’s not an incremental improvement. That’s a qualitative shift in how much context the model can actually use.

There’s a catch: prompts over 200k tokens cost more ($10/$37.50 per million input/output tokens vs. the standard $5/$25). Standard pricing stays the same.

New developer features worth knowing about

Adaptive thinking. Previously you had a binary choice: extended thinking on or off. Now the model decides when deeper reasoning is actually needed. You can still override this with effort controls (low, medium, high, max).

Context compaction. For long-running agents and conversations that bump up against the context window, the model can now automatically summarize and replace older context. This is useful for agentic workflows that need to run for extended periods.

Agent teams in Claude Code. You can spin up multiple agents that work in parallel and coordinate autonomously. This is in research preview, but the use case is clear: tasks that split into independent work like codebase reviews or multi-file refactors.

128k output tokens. Larger outputs in a single request, which means fewer multi-request workarounds for big generation tasks.

What the early adopters are saying

The testimonials from Anthropic’s early access partners are worth reading. A few that stood out to me:

Cursor’s CEO noted it “stays on long-horizon tasks where others drop off.” SentinelOne reported it handled a multi-million-line codebase migration “like a senior engineer,” finishing in half the time. Rakuten said it autonomously closed 13 issues and assigned 12 more across a 50-person org in a single day, and knew when to escalate to a human.

NBIM ran 40 cybersecurity investigations in a blind comparison against Claude 4.5 models. Opus 4.6 produced the best results 38 out of 40 times.

Safety stays consistent

One thing worth noting: the intelligence gains didn’t come at the expense of safety alignment. Opus 4.6 shows misalignment rates as good as or better than Opus 4.5, and has the lowest rate of over-refusals of any recent Claude model. That last part matters practically. Fewer false refusals means less time wrestling with the model over legitimate requests.

My take

I use Claude Code and the API daily for WordPress development work. The improvements that matter most to me: better performance in large codebases, longer sustained focus during agentic tasks, and the expanded context window. These directly address the friction points I’ve hit with previous versions.

The model is available now on claude.ai, the API (claude-opus-4-6), and major cloud platforms. If you’re already on Claude, you’re already using it. If you’re on the API, the model string is claude-opus-4-6 and pricing hasn’t changed for standard usage.

Full announcement from Anthropic: Introducing Claude Opus 4.6 ↗