Context Rot Research — LLM Performance vs Context Size

MRCR v2, NIAH, NoLiMa, and LongSWE-Bench benchmarks across Claude, GPT, and Gemini models · March 2026

Key insight from NIAH researcher: Standard Needle-in-a-Haystack scores are misleading — GPT-4.1 scores 100% and Claude 3.5 Sonnet ~99% on lexical NIAH, but when NoLiMa removes keyword overlap, 11/12 models drop below 50% of baseline at just 32K tokens. The effective context window is typically 60–70% of the advertised maximum.

MRCR v2 (8-needle) — Score vs Context Size

Task-Specific Degradation Rate

Model Accuracy at Key Context Sizes (MRCR + NoLiMa)

Context Danger Zones — When to Compact

Zone	Token Range	Risk	Action
Safe	0 – 8K	All models at peak performance	No action needed
Caution	8K – 32K	Lost-in-middle effect begins (30%+ drop for middle info)	Place key info at start/end
Degrade	32K – 64K	Most models below 50% baseline on non-literal tasks	Compact history, use RAG
High Risk	64K – 128K	Coding 29% → 3%; response times spike	Aggressively compact or /clear
Critical	128K+	Only Opus 4.6 & Gemini 2.5 Pro remain viable	Retrieval-only tasks; no complex reasoning

Standard NIAH vs NoLiMa (Real-World Retrieval)

Lost-in-the-Middle — Recall by Needle Position (128K)

Sources

Michelangelo Benchmark (DeepMind) — arXiv:2409.12640

OpenAI MRCR v2 Dataset — Hugging Face

MRCR v2 (8-needle) Leaderboard — llm-stats.com

Introducing Claude Opus 4.6 — anthropic.com

Claude Sonnet 4.6 System Card — anthropic.com

Context Rot: Chroma Research — trychroma.com

NoLiMa: Beyond Literal Matching — arXiv:2502.05167

Context Length Hurts Despite Perfect Retrieval — arXiv:2510.05381

LongCodeBench (1M coding eval) — arXiv:2505.07897

Effective Context Engineering for AI Agents — anthropic.com

Lost in the Middle (Liu et al. 2023) — arXiv:2307.03172

Long-Context RAG: OpenAI o1 & Gemini — Databricks

RULER Benchmark (NVIDIA, COLM 2024) — arXiv:2404.06654

Greg Kamradt's NIAH Test — GitHub

Gemini 1.5 Technical Report — arXiv:2403.05530

Maximum Effective Context Window — arXiv:2509.21361