Google I/O has always been about showing off what’s next. This year, “what’s next” is AI agents doing your work for you, and Google wants Gemini running all of it.
The company released Gemini 3.5 Flash today, skipping the usual preview phase and going straight to general availability. That’s notable on its own, but what’s more interesting is where Google is deploying it: everywhere. The model is now live in the Gemini app, in AI Mode in Google Search, in Google’s new “agent-first development platform” called Google Antigravity, in Android Studio, and across enterprise products through Gemini Enterprise.
This isn’t a gradual rollout. Google says 3.5 Flash is “available today to billions of people globally.” That’s the kind of scale that suggests the company is confident enough in the model to make it the default experience for a huge chunk of its user base.
Simon Willison notes that 3.5 Flash is more expensive than its predecessor, which runs counter to the usual “Flash” naming convention. Typically, Flash models are the cheaper, faster variants. But Google appears to be positioning this one as capable enough to handle serious agentic tasks, not just quick summarization or chat.
The model is already integrated into LLM, Willison’s command-line tool for working with language models. He updated the llm-gemini plugin the same day to support it. That’s one of those small signals that matters: when a model drops and developers can start testing it within hours, you get real feedback fast.
The other big announcement is Google Antigravity, which Google is calling an “agent-first development platform.” Details are still sparse, but the framing is clear: Google wants to own the stack for building AI agents, not just provide the models.
This puts Google in direct competition with frameworks like LangChain, Microsoft’s Semantic Kernel, and the growing ecosystem of agent orchestration tools. The difference is that Google controls the model, the API, the tooling, and now the platform. That vertical integration could be a major advantage, or it could be another Google product that developers try once and then ignore. The track record is mixed.
In his I/O keynote, Sundar Pichai leaned hard into the “agentic” framing. The blog post uses the phrase “agentic Gemini era” right in the title. Google wants you to think of AI not as a chatbot you ask questions, but as something that takes actions on your behalf.
That’s a shift in how these tools are positioned. It’s not “ask Gemini a question.” It’s “let Gemini do this for you.” Whether that’s booking a reservation, writing code, or managing your inbox, the pitch is that the AI should be proactive, not reactive.
The challenge is that agents are still brittle. A model that gets 95% of tasks right sounds great until you hit the 5% where it quietly does the wrong thing. Google’s betting that 3.5 Flash is good enough to handle that risk at scale. We’ll see if users agree.
While Google was announcing its agent push, a project called Forge hit the front page of Hacker News with a bold claim: guardrails can take an 8B parameter model from 53% accuracy to 99% on agentic tasks.
That’s a huge jump, and if it holds up, it suggests that model quality isn’t the only thing that matters for agents. How you constrain the model, how you structure its outputs, and how you validate its actions might matter just as much as raw capability.
Forge is worth watching because it’s tackling the part of agentic AI that everyone worries about but few people are solving well: how do you make sure the agent doesn’t go off the rails? If smaller models with good guardrails can match or beat larger models without them, that changes the economics of deploying agents in production.
Google isn’t the only one pushing agents. OpenAI has been positioning GPT-4 and its successors as capable of multi-step reasoning and tool use. Anthropic has been talking about agents for months. Microsoft is integrating agents into Copilot. But Google has distribution that the others don’t, and they’re using it.
Putting 3.5 Flash in front of billions of users isn’t just a product decision. It’s a data play. Every interaction teaches Google what works and what doesn’t. Every failure gets logged. Every success gets reinforced. At that scale, Google can iterate faster than anyone else.
The question is whether the model is ready for it. If 3.5 Flash handles the load and users don’t revolt, Google wins. If it breaks in visible, embarrassing ways, this could be another Google Assistant moment: lots of hype, disappointing execution, and users going back to what they were doing before.
We’ll know soon enough. When you deploy to billions of people, you don’t get much time to fix things quietly.
One email at dawn. The five stories that mattered, with the bits removed and the meaning kept. Free, for now.