Morning Edition LIVE
Vol. I · No. 1
Est.
MMXXVI

The A.I. Beat

Dispatches from the frontier of machine intelligence
Three
Dollars
← Front page Code May 20, 2026 · 6 min read
Code

Google goes all-in on AI agents at I/O, while Demis Hassabis says "we're just getting started"

Google's Gemini 3.5 Flash is now powering search, Android Studio, and billions of user interactions as the company bets its product line on agentic AI.
Google goes all-in on AI agents at I/O, while Demis Hassabis says "we're just getting started"

Google I/O has always been about showing off what’s next. This year, “what’s next” is AI agents doing your work for you, and Google wants Gemini running all of it.

The company released Gemini 3.5 Flash today, skipping the usual preview phase and going straight to general availability. That’s notable on its own, but what’s more interesting is where Google is deploying it: everywhere. The model is now live in the Gemini app, in AI Mode in Google Search, in Google’s new “agent-first development platform” called Google Antigravity, in Android Studio, and across enterprise products through Gemini Enterprise.

This isn’t a gradual rollout. Google says 3.5 Flash is “available today to billions of people globally.” That’s the kind of scale that suggests the company is confident enough in the model to make it the default experience for a huge chunk of its user base.

What’s actually different about 3.5 Flash

Simon Willison notes that 3.5 Flash is more expensive than its predecessor, which runs counter to the usual “Flash” naming convention. Typically, Flash models are the cheaper, faster variants. But Google appears to be positioning this one as capable enough to handle serious agentic tasks, not just quick summarization or chat.

The model is already integrated into LLM, Willison’s command-line tool for working with language models. He updated the llm-gemini plugin the same day to support it. That’s one of those small signals that matters: when a model drops and developers can start testing it within hours, you get real feedback fast.

Antigravity and the agent platform play

The other big announcement is Google Antigravity, which Google is calling an “agent-first development platform.” Details are still sparse, but the framing is clear: Google wants to own the stack for building AI agents, not just provide the models.

This puts Google in direct competition with frameworks like LangChain, Microsoft’s Semantic Kernel, and the growing ecosystem of agent orchestration tools. The difference is that Google controls the model, the API, the tooling, and now the platform. That vertical integration could be a major advantage, or it could be another Google product that developers try once and then ignore. The track record is mixed.

The agent era, according to Sundar

In his I/O keynote, Sundar Pichai leaned hard into the “agentic” framing. The blog post uses the phrase “agentic Gemini era” right in the title. Google wants you to think of AI not as a chatbot you ask questions, but as something that takes actions on your behalf.

That’s a shift in how these tools are positioned. It’s not “ask Gemini a question.” It’s “let Gemini do this for you.” Whether that’s booking a reservation, writing code, or managing your inbox, the pitch is that the AI should be proactive, not reactive.

The challenge is that agents are still brittle. A model that gets 95% of tasks right sounds great until you hit the 5% where it quietly does the wrong thing. Google’s betting that 3.5 Flash is good enough to handle that risk at scale. We’ll see if users agree.

Meanwhile, guardrails are getting serious attention

While Google was announcing its agent push, a project called Forge hit the front page of Hacker News with a bold claim: guardrails can take an 8B parameter model from 53% accuracy to 99% on agentic tasks.

That’s a huge jump, and if it holds up, it suggests that model quality isn’t the only thing that matters for agents. How you constrain the model, how you structure its outputs, and how you validate its actions might matter just as much as raw capability.

Forge is worth watching because it’s tackling the part of agentic AI that everyone worries about but few people are solving well: how do you make sure the agent doesn’t go off the rails? If smaller models with good guardrails can match or beat larger models without them, that changes the economics of deploying agents in production.

The race is on

Google isn’t the only one pushing agents. OpenAI has been positioning GPT-4 and its successors as capable of multi-step reasoning and tool use. Anthropic has been talking about agents for months. Microsoft is integrating agents into Copilot. But Google has distribution that the others don’t, and they’re using it.

Putting 3.5 Flash in front of billions of users isn’t just a product decision. It’s a data play. Every interaction teaches Google what works and what doesn’t. Every failure gets logged. Every success gets reinforced. At that scale, Google can iterate faster than anyone else.

The question is whether the model is ready for it. If 3.5 Flash handles the load and users don’t revolt, Google wins. If it breaks in visible, embarrassing ways, this could be another Google Assistant moment: lots of hype, disappointing execution, and users going back to what they were doing before.

We’ll know soon enough. When you deploy to billions of people, you don’t get much time to fix things quietly.

coding developer tools