Issue #61 · AI Agent Insider
Google's Managed Agents API: From "Build Your Own Scaffolding" to One API Call
Wednesday, May 20, 2026 · 11 min read
Table of Contents
The Hook
Google shipped the developer infrastructure for the agentic era at I/O 2026. A single API call now provisions a full production agent – code execution, tool use, isolated sandbox, persistent state – without any orchestration scaffolding. Simultaneously, Cohere made its second European acquisition in as many weeks, targeting sovereign pharma AI. And the benchmark picture on Gemini 3.5 Flash rewrites the cost-performance tradeoff operators have been using to justify their model choices.
This Week’s Signal
Google’s Managed Agents API: From “Build Your Own Scaffolding” to One API Call
For the past two years, the hardest part of deploying a production AI agent has not been picking a model – it has been the infrastructure underneath it. Isolated sandboxes, state persistence, tool invocation, code execution environments, multi-agent coordination – each of these required bespoke engineering work before any actual agent logic ran. At I/O 2026, Google eliminated most of that.
The Managed Agents API in the Gemini API – launched May 19 in preview – provisions a complete agent runtime with a single call to the Interactions API. What you get: an ephemeral Linux environment where the agent can reason, plan, call tools, execute code, browse the web, and manage files. Follow-up calls resume the same environment with full file and state continuity. No sandbox management. No session infrastructure. No orchestration layer to write.
The agent harness underlying this is the same Antigravity agent harness that powers Google’s own products – Deep Research, Gemini Spark, AI Mode in Search. Developers get access to the same infrastructure Google has been running internally, not a stripped-down approximation.
The customization model is also worth noting. Rather than requiring code to define agent behavior, Google is using AGENTS.md and SKILL.md files – markdown definitions that register as managed agents. This is consistent with how agentic frameworks have evolved: declarative configuration over imperative orchestration code. Custom agent templates will ship in AI Studio for enterprise users to clone and modify.
The model powering all of this is Gemini 3.5 Flash, also launched at I/O 2026. The benchmark data is not aspirational: 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 1656 Elo on GDPval-AA for agentic tasks – outperforming Gemini 3.1 Pro across agentic and coding benchmarks while running 4x faster than other frontier models in output tokens per second. Google’s claim that it costs less than half of comparable frontier models is the operational number that changes build/buy math for teams currently running on Claude or GPT-4-class models.
The Antigravity 2.0 desktop app and CLI ship alongside the API. The desktop app supports parallel agent orchestration – multiple agents executing tasks simultaneously – with scheduled background tasks and native integrations into Firebase, Android Studio, and Google AI Studio. The CLI migrates Gemini CLI users to a lighter, terminal-native surface. The Antigravity SDK provides programmatic access to the same harness, deployable on any infrastructure.
Gemini Spark – announced separately but architecturally connected – is the consumer-facing manifestation of this stack. Built on Gemini 3.5 Flash and the Antigravity harness, Spark runs 24/7 on dedicated Google Cloud VMs without requiring the user’s device to stay on. It integrates out-of-the-box with Gmail, Docs, Sheets, and Slides, and connects to any third-party service over MCP. Available to AI Ultra ($200/month) subscribers starting next week.
What this means for your stack: The practical implication is a reset on the infrastructure cost and time-to-deploy for agent products. Teams that have been deferring agentic product development because of orchestration complexity now have a production-grade baseline to build on. The tradeoff is Google Cloud lock-in for the managed runtime layer – the SDK deploys on your own infrastructure, but Managed Agents requires the Gemini API. For operators currently on AWS or Azure, the evaluation question is whether the infrastructure abstraction is worth the migration cost. For operators starting fresh or building net-new agent workflows, the economics are hard to argue against: one API call, persistent state, isolated execution, at half the cost of equivalent frontier compute.
3 Operator Playbooks
1. Cohere Is Building a Sovereign AI Stack for Regulated Verticals – DOMAIN: Business & Strategy
Cohere acquired Reliant AI on May 19, bringing a Berlin- and Montreal-based biopharma AI startup – $11.3 million seed round, founded by former Google and DeepMind researchers – into its product portfolio. The technology becomes “North for Pharma,” an agentic system targeting R&D workflows, clinical development, and scientific analytics. Existing pharma clients – GSK, Medicus Pharma, and Kyowa Kirin – transfer with the deal.
This is Cohere’s second European acquisition in weeks, following its merger with Aleph Alpha, Germany’s sovereign-AI champion. The pattern is deliberate: Cohere is assembling a vertically specialized, sovereign-deployment stack rather than competing head-to-head with OpenAI and Anthropic on general-purpose model quality. “Sovereign” here means models run on customer-controlled infrastructure, data never leaves the customer’s environment, and the deployment can be air-gapped if required. In regulated industries – pharma, financial services, government – this is not a nice-to-have; it is the procurement requirement that blocks every other vendor.
The Reliant AI acquisition specifically targets systematic literature review automation and unstructured regulatory data extraction – two workflows that consume enormous analyst time in biopharma and that hyperscaler distribution models have failed to serve because they require custom deployment and domain-specific compliance controls. Cohere is betting that vertical depth plus sovereign deployment creates a defensible moat in segments where data privacy makes cloud-hosted AI a non-starter.
Your move: If you are selling AI solutions into regulated industries – healthcare, financial services, legal, government – the Cohere playbook is a competitive signal about where the market is heading. Generic API access does not close these deals. The questions your enterprise buyers are asking are: Where does the data go? Who can see it? Can you deploy on our infrastructure? If your current stack cannot answer those questions with “your infrastructure, no one, yes,” you are competing on general capability against vendors who are competing on compliance and control. For operators building in regulated verticals, evaluate sovereign deployment options before your next enterprise pitch. For operators outside regulated verticals, watch this space: the same sovereign-deployment demand is spreading from healthcare to fintech to infrastructure.
2. Gemini 3.5 Flash Is the New Cost-Performance Baseline – DOMAIN: Research & Science
The benchmark data from Google’s Gemini 3.5 Flash release is operationally significant, not just competitively interesting. The numbers: Terminal-Bench 2.1 at 76.2%, MCP Atlas at 83.6% (which measures multi-agent coordination), GDPval-AA at 1656 Elo for agentic task completion, CharXiv Reasoning at 84.2% for multimodal understanding. These are not cherry-picked single-benchmark claims – they represent the full agentic task profile that operators actually care about: coding, tool use, multi-step reasoning, and multimodal input.
The speed and cost claims are the part that changes operator math. Google states 3.5 Flash runs at 4x the output token throughput of other frontier models and costs less than half the price of comparable options. Both claims land in the Artificial Analysis index’s top-right quadrant – the position that has historically corresponded to the model that captures the bulk of production deployments. That quadrant has not been occupied by a Google model in recent memory.
The co-optimization with Antigravity matters here. 3.5 Flash is designed to deploy as a subagent in multi-agent pipelines, not just as a standalone model. In practice, this means an orchestrator agent (3.5 Pro, coming next month) can dispatch multiple Flash subagents in parallel, each with isolated execution environments, at a cost structure that makes large-scale multi-agent deployments economically viable for the first time. The synthesis of AlphaZero paper into a playable game – two agents, six hours – is the kind of benchmark that translates directly into real engineering workflows.
Your move: Re-run your current agent workload cost estimates against Gemini 3.5 Flash pricing. For any workflow where you are paying frontier-model rates for fast, parallel, tool-calling tasks – summarization, data extraction, code generation, structured output – the economics warrant a benchmark test before your next infrastructure renewal. The test is straightforward: run 50 representative tasks through your current model and through 3.5 Flash via the Gemini API; measure accuracy, latency, and cost per task. If Flash performs within 5% on accuracy at half the cost, the switching math is simple. Do not wait for your model vendor to tell you the market shifted.
3. Google Search Now Runs Agents That Watch the Web While You Work – DOMAIN: Operator Wins & Failures
The most overlooked announcement at Google I/O 2026 has direct operational implications for anyone running a business that depends on competitive or market intelligence. Starting this summer, Google Search will allow users to create, configure, and run persistent information agents that monitor the web 24/7 and synthesize changes into personalized alerts. This is not Google Alerts rebranded – these agents understand context, not just keyword matches, and can “make sense” of changes rather than just detecting them.
The framing in Google’s announcement was deliberately understated. Pichai described it as an evolution of Google Alerts. The more accurate description: it is the first mass-market deployment of ambient web-monitoring agents at consumer scale, running on the same Gemini 3.5 Flash and Antigravity harness that powers the developer API. Each user’s information agents run on dedicated GCP infrastructure. Multiple agents can be active simultaneously, each with custom parameters.
For operators, this creates two distinct dynamics. First, the competitive intelligence gap between well-resourced and under-resourced teams narrows dramatically when every Google AI Ultra subscriber has access to persistent web-monitoring agents for $200/month. Second, the behavior of these agents – what they index, how they synthesize, and what they surface – will affect how AI-generated content performs in Google’s ecosystem. Content that triggers agent actions (articles that change, events that evolve, decisions that update) will be valued differently than static content in an agent-mediated information flow.
Your move: For any workflow where you are currently paying for a dedicated market-intelligence platform – competitor tracking, regulatory monitoring, earnings surveillance – evaluate whether Google’s information agents match 80% of your use case at a fraction of the cost once they launch this summer. If they do, reallocate that budget toward the 20% of intelligence work that requires human judgment or proprietary data access. For content operators: think about how your content structure performs in an agent-mediated distribution environment. Agents respond to changes, updates, and evolving narratives. Static, evergreen content is less likely to surface in agent-driven alerts than content that updates as situations develop.
Steal This
The Managed Agent Evaluation Checklist
Before migrating any existing agent workflow to a new managed runtime (Google Managed Agents, AWS AgentCore, Azure AI Foundry, or any equivalent), run this 20-minute evaluation against your current deployment. Use it to decide whether the infrastructure abstraction is worth the migration cost.
MANAGED AGENT RUNTIME EVALUATION
==================================
Current stack: _______________
Candidate runtime: _______________
Workload type: _______________
Review date: _______________
CAPABILITY FIT
[ ] Does the managed runtime support every tool type your agent uses?
List each: web browse / code exec / file I/O / external APIs / MCP
[ ] Does it support multi-agent orchestration (subagent spawning)?
[ ] Is state persistence between sessions supported natively?
[ ] What is the maximum context window for the backing model?
[ ] Does the runtime support your required input modalities (text /
image / audio / structured data)?
COST COMPARISON
[ ] Price per 1M input tokens (current vs. candidate): _____ vs. _____
[ ] Price per 1M output tokens: _____ vs. _____
[ ] Infrastructure/orchestration cost eliminated by migration: $_____
[ ] Engineering time saved per deployment (estimate hours): _____
[ ] Break-even point (months): _____
LOCK-IN ASSESSMENT
[ ] What percentage of agent logic is in the managed harness vs. your code?
[ ] Can you export agent definitions (AGENTS.md / system prompts) and
redeploy to a different runtime if the vendor raises prices?
[ ] Does the runtime use open standards (MCP, OpenAI-compatible API)?
[ ] What is the vendor's SLA for the managed agent service?
SECURITY & COMPLIANCE
[ ] Does data leave your jurisdiction when the agent executes?
[ ] Is there an audit log for all agent actions (tool calls, outputs)?
[ ] Can you restrict the agent's egress network access?
[ ] Is the execution environment certifiable for your compliance regime
(SOC 2, HIPAA, ISO 27001)?
MIGRATION DECISION GATE
If cost savings exceed lock-in risk AND compliance requirements are met:
-> Migrate
If compliance requirements are NOT met:
-> Evaluate sovereign deployment options before proceeding
If cost savings are marginal but infrastructure complexity is high:
-> Migrate the simplest workflow first, validate, then expand
The Bottom Line
Google I/O 2026 was a developer conference with a clear operational thesis: the infrastructure overhead that has been blocking agent deployment at scale is now Google’s problem, not yours. The Managed Agents API, Antigravity SDK, Gemini 3.5 Flash, and Gemini Spark form a coherent stack – from low-level agent runtime to consumer-facing 24/7 assistant – that raises the bar for every vendor competing in this space. Cohere’s back-to-back European acquisitions tell a parallel story: the teams building for regulated verticals are not waiting for hyperscalers to solve sovereign deployment, they are assembling purpose-built stacks that win on compliance before they compete on capability. The cost-performance benchmark on 3.5 Flash is the operator forcing function: if you have not repriced your current model spend against the new frontier, you are leaving money on the table. The infrastructure question is settling. The business-model and compliance questions are where the next competitive gaps open.
AI Insider is published by Digital Forge Studios Inc.
Stay sharp.
New issues every weekday. No spam, no fluff — just the practitioner's edge.