Quality % is computed against the selected reference (top-right). SWE-bench Pro is the trustworthy benchmark; Verified is contaminated.
Model
Provider
Quality vs ref
SWE-Pro
SWE-Ver
LCB
AIME
In $/Mtok
Out $/Mtok
Cache
Context
tok/s
Released
Quota burn cross-matrix
Burn ratios shown as multiples of the selected reference model at medium effort. OpenAI (Codex /effort) and Anthropic (Claude Code /effort) share the same vocabulary — low / medium / high / xhigh — with Anthropic adding 'max' for Opus 4.7. Google uses thinking budgets. Multipliers stack: Fast mode ×2.0, cached input ×0.6, plan mode forces high. Switch the reference dropdown at the top of the page to recompute all ratios.
Subscription tiers
Provider
Tier
$/mo
Limits
Models
Features
Automated agent policy
Provider
Sub on automation
Enforcement
First-party exception
API needed?
Agent harness landscape
Two workflow modes: supervised (you wait) and autonomous (overnight). Anthropic OAuth blocked third-party tools on April 4, 2026 — affected harnesses marked with *.
Harness
Vendor
Category
MCP
Skills
Hooks
Subagents
Voice
Remote
Computer Use
LSP
Memory
SWE-Pro
Pricing
Detailed profiles
Hardware × model fit
Quantization, VRAM used, and tok/s estimates per hardware config. Quality % is vs selected reference.
Hardware options
Hardware
Type
VRAM
Cost
Notes
Inference frameworks
Framework
Best for
Notes
Capability radar
6-axis capability rating per model (1–5 each; see
rubric).
Composite is the weighted average of available axes. Reference model
below the chart is the one selected in the top-right dropdown.