Issue #7 · AI Agent Insider

Issue #7: 1M-Token Context Goes Standard

Table of Contents

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The price of long context just went to zero. The price of not using it just went up.


This Week’s Signal

Anthropic made the full 1M-token context window generally available for Opus 4.6 and Sonnet 4.6 — at standard per-token pricing. No multiplier. A 900K-token request is billed at the same rate as a 9K one. Media limits jumped to 600 images or PDF pages per request. Opus 4.6 scores 78.3% on MRCR v2, highest among frontier models at that context length. The practical implication: you can now load an entire codebase, thousands of contract pages, or the full trace of a long-running agent — tool calls, observations, intermediate reasoning — without the engineering tax of chunking, summarizing, or clearing context. The lossy workarounds that defined agent architecture for the last year are now optional overhead. Rebuild accordingly.


Operator Playbooks

1. Eliminate context management plumbing from your agent stack

If you’re running RAG pipelines, sliding window context managers, or summarization chains primarily because of context cost or limits — re-evaluate today. With 1M tokens at flat rate on both Opus and Sonnet, the cost math that justified those components may no longer hold. Run a side-by-side: full-context injection vs. your current RAG pipeline on the same task. Measure accuracy, latency, and total cost. Many teams will find the retrieval layer adds complexity without improving outcomes now that the model can hold the full document set.

2. Set up autonomous research loops overnight

Karpathy’s autoresearch script is 630 lines of MIT-licensed Python. An AI agent reads its own code, hypothesizes improvements, modifies, runs experiments, evaluates, and iterates — entirely unattended. In 2 days it processed ~700 autonomous changes and found ~20 improvements that transferred to larger models. The 11% efficiency gain on a well-tuned benchmark is the headline, but the real takeaway is the method: define a metric, give the agent a compute budget, and let it optimize while you sleep. This works for ML training, but also for A/B test parameter sweeps, prompt optimization, and any workflow with a measurable objective.

3. Turn tribal knowledge into one-click agent Skills in Office

Anthropic’s Claude for Excel and PowerPoint now shares context across both apps in a single session, and teams can save repeatable workflows as “Skills” — one-click actions available to the whole org. The move: identify the 3 most-repeated analytical workflows in your team (variance analysis, deck prep, data formatting). Build each as a Skill. Deploy org-wide. Works via Claude account or through Bedrock, Vertex, or Foundry gateways. This converts institutional knowledge that lived in one person’s head into automated, reproducible agent actions.


Steal This

Autonomous optimization loop template (adapt from Karpathy’s autoresearch):

1. Pick ONE metric you can measure automatically (latency, cost, accuracy, conversion)
2. Write a baseline script that produces that metric
3. Give an agent read/write access to the script + a fixed compute budget (5-15 min)
4. Loop: agent reads code → hypothesizes change → modifies → runs → evaluates → keeps or reverts
5. Set a stopping condition (N iterations, time limit, or improvement plateau)
6. Review the diff in the morning — cherry-pick what transferred

Karpathy ran 126 experiments overnight on attempt one. 700 over a weekend. The constraint isn’t intelligence — it’s having a metric worth optimizing.


One More Thing

Manufact raised $6.3M (Peak XV led, fka Sequoia India) to build MCP infrastructure — open-source tools for making software agent-accessible. Their thesis: every product on earth will need an agent-facing interface. Three-person team, YC batch. Meanwhile, Google published research showing AI agents learn to cooperate through diverse opponent training — no hardcoded coordination needed. Two signals pointing the same direction: the multi-agent future doesn’t need more orchestration frameworks. It needs better interfaces and better training.


Forward This

If this issue was useful, send it to one person building agents. That’s how we grow this — no ads, no sponsorships, just practitioners sharing the edge.

Read online → insider.dforge.ca

Support the forge

☕ Ko-fi 🎁 Patreon
ETH0x3a4289F5e19C5b39353e71e20107166B3cCB2EDB BTC16Fhg23rQdpCr14wftDRWEv7Rzgg2qsj98 DOGEDNofxUZe8Q5FSvVbqh24DKJz6jdeQxTv8x