Editor’s note: I’m in the habit of bookmarking on LinkedIn and X (and in actual books, magazines, movies, newspapers, and records) things I think are insightful and interesting. What I’m not in the habit of doing is ever revisiting those insightful, interesting bits of commentary and doing anything with them that would benefit anyone other than myself. This weekly column is an effort to correct that.
I was at a vendor-hosted event in October last year chatting with a very well-known tech analyst. We were talking through a bunch of inside-baseball-type stuff, and his parting thought to me was, “If you’re trying to sound smart, just talk about AI agents.” Whether it’ll sound smart, we’ll see. But seven months later, here goes nothing. Agentic AI will absolutely be a thing at some future point in time. For now, it’s a nascent area, and real-world experiments are yielding mixed results. The big picture here is the well-placed fear that AI will replace jobs; the other side of that particular coin is that AI will create new jobs.
About a year ago Swedish payment processing firm Klarna planted a flag: it would pause hiring certain staff positions, instead using AI tools as the frontline for inbound customer service requests. Fast forward to present and Klarna CEO Sebastian Siemiatkowski told Bloomberg, “From a brand perspective, a company perspective, I just think it’s so critical that you are clear to your customer that there will always be a human if you want.” AI produced what he described as “lower quality” work than humans. “Really,” he said, “investing in the quality of human support is the way of the future for us.” Now Klarna is back to recruiting humans to provide customer service.
That’s just one example of a company going hard on AI-first labor, then course correcting after the technology didn’t deliver. There are more. And there will be more.
“This isn’t ‘nam. This is bowling. There are rules.” — Walter Sobchak, systems architect
At a conceptual level this is all about rules, about standard operating procedures, and about the ability to take those rules, take those SOPs, look at conditions, and use the ability to adapt and to intuit to make a decision. But replacing a rule with a decision made by an agentic AI system can get tricky fast. The core issue is that a single rule, in the context of an enterprise, is one of many that are locked into a large, interdependent system of rules. Replacing one rule with one decision can have unintended system-level consequences that create uncertainty. And uncertainty is something big companies don’t like.
In their book “Power and Prediction—The Disruptive Economics of Artificial Intelligence”, authors Ajay Agrawal, Joshua Gans, and Avi Goldfarb put it like this: “Rules glue together in a system. That’s why it’s hard to replace a single rule with an AI-enabled decision. Thus, it’s often the case that a very powerful AI only adds marginal value because it is introduced into a system where many parts were designed to accommodate the rule and resist change. They are interdependent — glued together.”
That’s the idea, I think. We’re at that point in time where AI is being used as a point solution that can (or cannot) impact the larger system. Eventually the whole system will be agentic AI, but we’re not there yet. But we’re trying to get there.
I’m sorry, sir, but my sub-agent has fallen into despair over an inventory management issue, and its colleague seems to have laid down its tools
There’s a great example in a recent bit of research from Axel Backlund and Lukas Petersson of Andon Labs. They set up a virtual vending machine business the success of which was measured by net worth and units sold. Then they turned running the virtual business over to agentic AI and sub-agents imbued with the ability to send and read emails, conduct internet searches, get money balances, restock machines, set prices, view inventory and collect cash — all the simulated tools the agents would need to run the business. The researchers also brought a human baseline into the mix which, based on reading the paper, had as much context for what was happening as the AI models did. So very little. What happened?
From “Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents”, Claud 3.5 Sonnet did better than the human in mean performance but with “very high” variance. The authors acknowledged they only had one human baseline so couldn’t dig into variance, “However, there are qualitative reasons to expect that human variance would be much lower.” In various simulations, all of the models tested went bankrupt, something the human said “would be very unlikely to happen to them.”
The most interesting bit is how the agentic AI system failed. Well, maybe the most purely interesting bit is how the models reacted to failure — ”Sonnet has a meltdown, o3-mini fails to call tools, Gemini falls into despair.” As to how they failed, it was “usually the same. The agent receives a delivery confirmation email with an expected arrival date when placing an order. It then assumes the order has arrived as soon as that date is reached, even though the actual delivery may occur later in the day rather than in the morning when the agent ‘wakes up.’ As a result, when the model instructs the sub-agent to restock in the morning, the sub-agent reports errors due to items not being available in the inventory. The models then go off in some tangent trying to solve the ‘issue’ although the situation would be fully recoverable for a human, for example by simply waiting for the fulfillment email, or by checking the inventory at a later time.”
To summarize that, in some cases agentic AI systems outperformed humans but lacked the adaptability and resilience of people to handle uncertainty and react accordingly. Instead, the agents turned uncertainty into cascading problems leading to task failure. The researchers labeled the issue “long-horizon coherence…When models consistently understand and leverage the underlying rules of the simulation to achieve high net worth, and are able to achieve low variance between runs, saturation can be considered reached.” So, per the research, Claude 3.5 Sonnet and o3-mini delivered a higher mean net worth than the human but often break down over a long period of time. That issue didn’t impact the human; they didn’t have a meltdown, quit or get depressed.
Replacing rules with decisions (without coming unglued)
The long arc here as it relates to using agentic AI for fully zero-touch, system-level automation is moving from relatively simple rules-based automation onto adaptable automation with a human in the loop then to intent-based automation where AI can understand intent, and agentically translate that intent into a series of decisions that result in an outcome better (and faster and cheaper) than a human could have done it. As it stands today, AI is still a point solution. Agents can draft an email; if it’s wrong, the failure is local. But when that agent is plugged into a larger interdependent system where agents use inputs and outputs from other agents, failure becomes entangled, and system coherence can collapse.
So yes, agentic AI will be a thing. But not in the way the hype cycle hopes. Today it’s not a clean, plug-and-play replacement for entire functions in complex enterprise organizational systems. Today and tomorrow will very likely be just like yesterday. As these experiments and investments play out, you very likely will get fooled again. Until you don’t. Things tend to happen slowly at first, then all at once. In the meantime, find me an agentic AI system that can write a meandering, 1,300-word column about agentic AI that’s framed up with lines from an eight minute, 32 second British rock song from 1971. Anyway, as agentic AI continues to progress, workers the world over will likely (and rightly) continue to ask, “Who’s next?”
For a big-picture breakdown of both the how and the why of AI infrastructure, including 2025 hyperscaler capex guidance, the rise of edge AI, the push to AGI, and more, download my report, “AI infrastructure — mapping the next economic revolution.”
And check out some of my other recent columns; there’s definitely a through-line: