Issue #2 · AI Agent Insider
Issue #2: Foundation Models Get Competition
Sunday, March 8, 2026 · 5 min read
Table of Contents
The infrastructure layer just got a lot more serious. If you’re still treating AI agents as prototypes, this week’s moves suggest the enterprise tier disagrees.
This Week’s Signal
Legora, a Swedish legal AI startup, went from 250 to 400+ law firm customers in under six months and is now eyeing a $400M raise at a $5–6B valuation — up from $1.8B. That’s a 3x valuation jump in five months, serving 40+ markets. Vertical AI agents with real workflow lock-in are compressing the timeline from MVP to unicorn. Take note of the playbook: deep domain, real workflow replacement, not chat.
3 Operator Playbooks
1. Migrate Your Enterprise Agent Stack to Bedrock Stateful Runtime
OpenAI and AWS co-launched a Stateful Runtime Environment on Amazon Bedrock with persistent context and memory across multi-step workflows. AWS is now the exclusive third-party cloud distributor for OpenAI Frontier enterprise agents. If you’re running multi-step enterprise workflows on stateless infrastructure, the migration path is live. Start with one high-latency workflow (e.g. contract review, data pipeline monitoring), port it to Bedrock, and benchmark context retention across sessions. AgentCore Policy (GA as of March 3) gives you Cedar-based natural-language access controls — use it to scope agent permissions by role from day one.
2. Swap Your Security Scanner for Codex Security Agent — Free First Month
OpenAI’s Codex Security Agent is in research preview. In testing it scanned 1.2M commits, surfaced 792 critical findings, cut false positives by 50%+, and caught over-reported severity in 90%+ of cases. That last number matters: bloated severity reports kill triage velocity. If you’re on Pro or Enterprise, you get the first month free. Deploy it on your main repo this week. Use the findings to benchmark against your current tool. If it outperforms on precision, swap it in as your primary AppSec pass.
3. Route Complex Agent Tasks to Gemini 3.1 Pro at $2/M Tokens
Gemini 3.1 Pro now leads agent-specific benchmarks: 77.1% on ARC-AGI-2 (double its predecessor), 94.3% on GPQA-Diamond, 33.5% on APEX-Agents. At $2/M input tokens, it’s cost-competitive for complex reasoning chains. Identify the two or three agent tasks in your stack where failures are highest — multi-step research, code reasoning, domain Q&A — and run a 50-call A/B test against your current model. Log accuracy and cost. If the delta holds, reroute those specific tasks to Gemini 3.1 Pro.
Steal This
The Model-Routing Prompt Template
When building multi-agent orchestrators, add this system instruction to your router: “Score this task on: (1) reasoning depth 1–5, (2) domain specificity 1–5, (3) tool use required Y/N. Route score 8+ to [Model A], tool use tasks to [Model B], all others to [Model C].” Lets you apply benchmark data to routing decisions without hardcoding model names.
CTA
If this issue saved you an hour of research, forward it to one operator on your team — they’ll thank you. Subscribe at [newsletter link] to get Issue #3 next Sunday.
AI Agent Insider — Issue #2
The infrastructure layer just got a lot more serious. If you're still treating AI agents as prototypes, this week's moves suggest the enterprise tier disagrees.
This Week's Signal
Legora, a Swedish legal AI startup, went from 250 to 400+ law firm customers in under six months and is now eyeing a $400M raise at a $5–6B valuation — up from $1.8B. That's a 3x valuation jump in five months, serving 40+ markets. Vertical AI agents with real workflow lock-in are compressing the timeline from MVP to unicorn. Take note of the playbook: deep domain, real workflow replacement, not chat.
3 Operator Playbooks
1. Migrate Your Enterprise Agent Stack to Bedrock Stateful Runtime
OpenAI and AWS co-launched a Stateful Runtime Environment on Amazon Bedrock with persistent context and memory across multi-step workflows. AWS is now the exclusive third-party cloud distributor for OpenAI Frontier enterprise agents. If you're running multi-step enterprise workflows on stateless infrastructure, the migration path is live. Start with one high-latency workflow (e.g. contract review, data pipeline monitoring), port it to Bedrock, and benchmark context retention across sessions. AgentCore Policy (GA as of March 3) gives you Cedar-based natural-language access controls — use it to scope agent permissions by role from day one.
2. Swap Your Security Scanner for Codex Security Agent — Free First Month
OpenAI's Codex Security Agent is in research preview. In testing it scanned 1.2M commits, surfaced 792 critical findings, cut false positives by 50%+, and caught over-reported severity in 90%+ of cases. That last number matters: bloated severity reports kill triage velocity. If you're on Pro or Enterprise, you get the first month free. Deploy it on your main repo this week. Use the findings to benchmark against your current tool. If it outperforms on precision, swap it in as your primary AppSec pass.
3. Route Complex Agent Tasks to Gemini 3.1 Pro at $2/M Tokens
Gemini 3.1 Pro now leads agent-specific benchmarks: 77.1% on ARC-AGI-2 (double its predecessor), 94.3% on GPQA-Diamond, 33.5% on APEX-Agents. At $2/M input tokens, it's cost-competitive for complex reasoning chains. Identify the two or three agent tasks in your stack where failures are highest — multi-step research, code reasoning, domain Q&A — and run a 50-call A/B test against your current model. Log accuracy and cost. If the delta holds, reroute those specific tasks to Gemini 3.1 Pro.
Steal This
The Model-Routing Prompt Template
When building multi-agent orchestrators, add this system instruction to your router: "Score this task on: (1) reasoning depth 1–5, (2) domain specificity 1–5, (3) tool use required Y/N. Route score 8+ to [Model A], tool use tasks to [Model B], all others to [Model C]." Lets you apply benchmark data to routing decisions without hardcoding model names.
CTA
If this issue saved you an hour of research, forward it to one operator on your team — they'll thank you. Subscribe at [newsletter link] to get Issue #3 next Sunday.
⚠️ PENDING DISTRIBUTION SETUP — Beehiiv account not yet configured. This issue is saved locally and ready for upload once the account is live.
Stay sharp.
New issues every weekday. No spam, no fluff — just the practitioner's edge.