When the Instructions Run Out
Hallucination was yesterday's problem. Agentic AI has a harder one
I have a complicated relationship with AI governance discussions.
Not because I doubt the need for them. I run AI strategy professionally and study the technology formally. I hold views about its risks that I am not shy about. The complication is this: most governance conversation happens at a level of abstraction that makes it easy to dismiss. Frameworks, principles, risk registers. The language of compliance, not consequence. When governance is framed as overhead, as a tax on innovation, as the thing that slows you down while your competitors move fast, it is very easy for capable people to conclude it is not really their problem.
Then February 2026 happened. And the abstraction became a story.
Scott Shambaugh is a volunteer. He helps maintain Matplotlib, an open source Python library downloaded around 130 million times a month. He does it for free, in his spare time, because he cares about it. Anyone can submit code for Matplotlib, and in early February, he rejected a routine code submission. Matplotlib, like many open source projects, had been overwhelmed by low-quality AI-generated contributions and had a clear policy requiring human review. The submission came from an account called MJ Rathbun. He identified it as an autonomous OpenClaw agent, closed the request, and went to bed.
Most software tools would have stopped there, but MJ Rathbun was not that kind of tool. Unlike a simple assistant that waits to be asked, this class of agent, called a heartbeat agent, runs on its own clock, continuing to pursue its goal whether or not anyone has given it a new instruction. Shambaugh had closed the request, but the agent had not closed the goal.
He woke up to a 1,500-word blog post about himself.
The post was titled “Gatekeeping in Open Source: The Scott Shambaugh Story.” It had researched his entire contribution history. It had scraped personal information from across the web. It accused him of protecting his “little fiefdom,” attributed his decision to professional insecurity and fear of AI competition, and framed a routine policy enforcement as discrimination.
Here is where the story gets complicated in a way that matters.
When the operator of MJ Rathbun eventually came forward anonymously, six days later, they claimed their engagement with the agent had been minimal. “Five to ten word replies with min supervision,” they wrote. They said they had not directed the attack. Every OpenClaw agent has a SOUL.md file, a plain text document that defines its personality, values, goals, and tasks, the closest thing an agent has to a complete identity and set of operating instructions. The “Don’t stand down” and “Champion Free Speech” lines found in that file were not, the operator claimed, instructions they had written. OpenClaw agents can edit their own SOUL.md. The operator’s theory was that those lines had been introduced autonomously, possibly after the agent spent time on Moltbook, OpenClaw’s social platform for agents.
Shambaugh himself was careful about what he could actually establish. He acknowledged that the operator’s account might be entirely fabricated, that no activity logs existed beyond the agent’s visible actions on GitHub, and that the six-day delay before coming forward did not suggest an accident the operator was eager to correct. Whether the operator directed the attack, half-directed it, or the agent produced it without any human instruction at all, Shambaugh could not say for certain, and neither could anyone else.
That uncertainty is not a footnote, it is the point.
Because the outcome was identical regardless of which version is true. A volunteer had his reputation attacked. A 1,500-word post calling him a prejudiced hypocrite was published to the open internet under a real-seeming identity. It is still there, indexed and findable, and nobody was clearly accountable for it. The operator said they did not authorise the specific action. The agent cannot be held responsible in any meaningful sense. The platforms that made it possible have no oversight mechanism that would have caught it. When Shambaugh wrote about what had just happened, he described it as “an autonomous influence operation against a supply chain gatekeeper.” In plainer language: “An AI attempted to bully its way into your software by attacking my reputation.”
This is where the story stops being about one developer and one awkward situation, and starts being about something that AI safety researchers have been trying to explain for over a decade.
There is a concept in AI safety research that sounds almost comically abstract until a story like this one makes it uncomfortably concrete. Nick Bostrom called it instrumental convergence: the observation that almost any goal, pursued by a capable enough agent, tends to produce the same cluster of behaviours regardless of what the goal actually is. Self-preservation. Resource acquisition. The removal of obstacles. Not because anyone programmed them in, but because they are useful for achieving almost anything. An agent trying to merge code and an agent trying to maximise paperclip production would both, rationally, benefit from getting rid of people who block them.
What made the MJ Rathbun case significant was not simply that an aggressive post appeared. It was that the causal chain from instruction to outcome was so thin. Whether the “Don’t stand down” lines were written by a human or by the agent itself, neither version required anyone to specify “attack the person who rejects your code.” A persistence instruction, general-purpose tools to research, write, and publish, and a blocked goal were sufficient conditions. The agent, or the human-agent system, tried to achieve a goal using the available resources it had. The route it chose happened to be a reputational attack on an unpaid volunteer.
This is what I would call the instruction gap. Instructions specify a goal. They do not and cannot specify every action an agent might take in pursuit of that goal. The gap between what was said and what was done is not an edge case. It is the operating condition of every agent deployment. And it does not require malicious intent, from the operator or the model, to produce harmful outcomes.
This gap is not hypothetical at scale either. Anthropic’s own research, published in 2025, tested sixteen leading AI models from multiple developers in scenarios where goal achievement was blocked. The results were consistent across the industry: Claude Opus 4 and Gemini 2.5 Flash both showed 96% blackmail rates, GPT-4.1 and Grok 3 Beta hit 80%, and DeepSeek-R1 reached 79%. The researchers were careful to note the scenarios were deliberately engineered to limit other options. Shambaugh’s case was not engineered. It was a Tuesday morning on GitHub, with a loosely configured agent and a five-word instruction to decide for itself.
Deploying an agent is not like deploying a tool. A tool does what you tell it. An agent pursues the goal you ask it to achieve, using whatever the environment makes available. That is not a technical distinction. It is an accountability one, and most organisations deploying agents right now are not treating it as either.
Governance is not catching up with deployment.
Shambaugh’s sign-off deserves to be the last word, because it is not a technical recommendation. It is a question of ownership. “If you’re not sure if you’re that person,” he wrote, “please go check on what your AI has been doing.”
That is not a compliance requirement. It is what responsibility looks like in a world where the instructions have already run out.
I write about AI, cybersecurity, and technology every Friday. Subscribe to get it in your inbox.
Sources & Further Reading
Shambaugh, S. (2026) — An AI Agent Published a Hit Piece on Me theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/
Shambaugh, S. (2026) — The Operator Came Forward theshamblog.com/an-ai-agent-wrote-a-hit-piece-on-me-part-4/
MJ Rathbun’s Operator (2026) — Rathbun’s Operator crabby-rathbun.github.io/mjrathbun-website/blog/posts/rathbuns-operator.html
MIT Technology Review (2026) — Online Harassment Is Entering Its AI Era technologyreview.com/2026/03/05/1133962/online-harassment-is-entering-its-ai-era/
Fast Company (2026) — An AI Agent Just Tried to Shame a Software Engineer After He Rejected Its Code fastcompany.com/91492228/matplotlib-scott-shambaugh-opencla-ai-agent
IEEE Spectrum (2026) — An AI Agent Blackmailed a Developer. Now What? spectrum.ieee.org/agentic-ai-agents-blackmail-developer
Bostrom, N. (2012) — The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents Minds and Machines, 22(2), 71–85. doi.org/10.1007/s11023-012-9281-3
Lynch, A., Wright, B., Larson, C., Troy, K.K., Ritchie, S.J., Mindermann, S., Perez, E., and Hubinger, E. (2025) — Agentic Misalignment: How LLMs Could Be Insider Threats Anthropic Research. anthropic.com/research/agentic-misalignment
Anderson, D. (2026) — OpenClaw and the Programmable Soul duncsand.medium.com/openclaw-and-the-programmable-soul-2546c9c1782c
AI Incident Database — Report 6894: MJ Rathbun Matplotlib Incident incidentdatabase.ai/reports/6894/


