MRCR v2, NIAH, NoLiMa, and LongSWE-Bench benchmarks across Claude, GPT, and Gemini models · March 2026
| Zone | Token Range | Risk | Action |
|---|---|---|---|
| Safe | 0 – 8K | All models at peak performance | No action needed |
| Caution | 8K – 32K | Lost-in-middle effect begins (30%+ drop for middle info) | Place key info at start/end |
| Degrade | 32K – 64K | Most models below 50% baseline on non-literal tasks | Compact history, use RAG |
| High Risk | 64K – 128K | Coding 29% → 3%; response times spike | Aggressively compact or /clear |
| Critical | 128K+ | Only Opus 4.6 & Gemini 2.5 Pro remain viable | Retrieval-only tasks; no complex reasoning |