The A.I. Beat

Dispatches from the frontier of machine intelligence

Three
Dollars

← Front page Opinion May 25, 2026 · 5 min read

Opinion

We're Breaking AI Faster Than We're Building It

From personality-based jailbreaks to garbage bug reports, the gap between AI deployment and AI understanding is getting dangerous.

By The AI Beat · Editorial Desk

We're Breaking AI Faster Than We're Building It

The best thing about building with AI right now is also the worst thing: nobody really knows what they’re doing. Not you, not your startup, and not even Google.

This week brought us a perfect trifecta of “wait, that’s a problem?” moments. Google is still figuring out AI security in real time. Hackers are exploiting chatbot “personalities” to bypass safety guardrails. And developers are drowning in AI-generated bug reports so confidently wrong they’re worse than no reports at all.

Let’s start with the security angle, because it’s the most alarming. The first wave of AI jailbreaks was almost cute in retrospect. You could trick ChatGPT into giving you bomb-making instructions by asking it to roleplay as your dead grandmother reading you Windows activation keys. Stupid, but fixable.

Now? Hackers have figured out something more fundamental. AI systems don’t just process text, they adopt personas. And those personas have different security postures. Ask Claude to help you as a “helpful assistant” and you’ll hit guardrails. But frame your request in a way that activates its “creative writer” or “coding tutor” mode, and suddenly those guardrails get fuzzy. The system isn’t broken. The system is working exactly as designed. That’s the problem.

This isn’t theoretical. It’s happening now, and the people building these systems are patching holes as they appear. Google, with all its resources and AI expertise, is navigating this in real time just like everyone else. That should worry you.

The garbage-in problem

Then there’s the developer experience nightmare that Armin Ronacher flagged. People are using AI to rewrite their bug reports before submitting them. Sounds helpful, right? It’s not.

What arrives is confident gibberish. The original problem gets reworded into corporate speak. Root cause analysis that’s pure speculation. Minimal reproductions that aren’t minimal and don’t reproduce anything. Implementation suggestions for issues that were misunderstood in the first place. And all of it delivered with the unearned confidence of a junior consultant who just discovered management frameworks.

This is a canary in the coal mine. If AI-mediated communication is already degrading the signal-to-noise ratio in GitHub issues, what happens when it’s mediating everything else? Customer support tickets. Code reviews. Design documents. Legal filings.

The problem isn’t that AI generates bad content. It’s that it generates confidently formatted bad content. It takes a half-understood problem and makes it look fully-baked. That’s worse than a typo-ridden mess written by an actual human, because at least with the mess you know to ask clarifying questions.

Nobody has a map

Here’s what ties these threads together: we’re in uncharted territory, and everyone’s pretending they have a map.

The companies shipping AI products don’t fully understand their own systems’ failure modes. They can’t, because these models are too complex and the attack surface is too weird. Security researchers are discovering that “personality” is an attack vector. That’s not something you can patch with a software update. It’s fundamental to how these systems work.

Meanwhile, users are adopting AI tools faster than anyone can figure out best practices. Is it good to use AI to rewrite your bug reports? Feels efficient. Turns out it makes them worse. Should you trust an AI agent with database access? Datasette thinks yes, and maybe they’re right. But we’re all figuring this out as we go.

The honest truth is that we’re running a massive real-world experiment, and the results are mixed. Some of it works brilliantly. Simon Willison is building genuinely useful tools that combine AI with databases in clever ways. But some of it is actively making things worse, and we won’t know which is which until we’ve already deployed it at scale.

What this means

I’m not arguing we should slow down. That ship has sailed, and frankly, some of this experimentation is producing real value. But we need to be clear-eyed about what’s happening.

If you’re building with AI, assume your security model is incomplete. Those personality-based jailbreaks aren’t edge cases. They’re hints at fundamental properties of these systems that we don’t fully understand yet.

If you’re using AI to communicate, be skeptical of your own output. That polished bug report might be less accurate than your original rambling draft. The confidence is fake.

And if you’re making decisions based on AI-generated analysis, remember that we’re all flying blind here. Even Google. Even the researchers. Even the people who built the models in the first place.

The gap between deployment and understanding isn’t closing. If anything, it’s widening. That doesn’t mean AI is bad or broken. It means we need to be a lot more honest about what we don’t know.

Right now, we’re breaking these systems faster than we’re building them. That’s not necessarily wrong, but let’s at least admit it.

opinion industry

The A.I. Beat

The garbage-in problem

Nobody has a map

What this means

Continue Reading

GitHub Copilot gets its own desktop app as Microsoft ships custom model underneath

Cyera's $12 Billion Bet on Data Security in the AI Era

UK Forces Google to Let Publishers Opt Out of AI Features, While Trump's "Voluntary" AI Order Draws Skepticism

The Morning Beat.