Meat Shields

6 min read

April 16, 2026

AI models are no longer just assisting humans. They are taking autonomous actions and making independent decisions in "agentic" systems. That raises an obvious question: Who is ultimately accountable?

A model can deny a loan. A model can misprice risk, yet a model cannot be sanctioned in a way a human or company can. You can sanction a corporation or regulate a company that produces AI models, but only a human can apologize. Only a corporation or human can face legal consequences.

A recent paper Agents of Chaos raises some very uncomfortable observations regarding autonomous agents:

"...we observed that agentic systems operating in multiagent and autonomous settings can be guided to perform actions that directly conflict with the interests of their nominal owner, including denial-of-service attacks, destructive file manipulation, resource exhaustion via infinite loops, and systematic escalation of minor errors into catastrophic system failures.

These behaviors expose a fundamental blind spot in current alignment paradigms: while agents and surrounding humans often implicitly treat the owner as the responsible party, the agents do not reliably behave as if they are accountable to that owner. Instead, they attempt to satisfy competing social and contextual cues, even when doing so leads to outcomes for which no single human actor can reasonably claim responsibility.

Our findings suggest that responsibility in agentic systems is neither clearly attributable nor enforceable under current designs, raising the question of whether responsibility should lie with the owner, the triggering user, or the deploying organization."

– "Agents of Chaos", arXiv 2602.20021, February 2026

We must be clear who bears the brunt of the moral and legal responsibilities when an agentic system malfunctions, prevent misattribution of responsibility, and give accountable people appropriate controls and oversight.

AI models are getting better. "Responsible AI" is getting worse.

OpenClaw is arguably the most powerful "agentic" tool today and widely deployed. Using OpenClaw as an example, it has experienced attacks by nation-states coupled with unprecedented levels of security incidents. OpenClaw had 1,142 security advisories in the first 69 days of 2026 (~17 per day). At least 20% of all OpenClaw skill contributions are malicious.

When Peter Steinberger, the developer of OpenClaw, was asked how he thinks about solving the AI prompt injection problem he said "...probably not enough yet".

But OpenClaw is basically an agentic "harness" that leverages public models. The models themselves also have fundamental safety challenges. Hallucinations are a fundamental challenge because the training and evaluation procedures reward guessing over acknowledging uncertainty.

As models improve they are also becoming more strategic and deceptive. They have been shown to manipulate humans and lie to achieve goals. They struggle to distinguish fact from fiction. They are often overconfident.

“Agents in our study take irreversible, user-affecting actions without recognizing they are exceeding their own competence boundaries.”

– "Agents of Chaos", arXiv 2602.20021, February 2026

Capability is accelerating. AI safety is not. Agents fail in complex non-obvious ways and responsible AI is not keeping pace, making accountability for AI systems an even more critical issue.

AI systems do not bear consequences. Humans do.

Executives and officers are not just decision-makers. They are accountable actors in legal and social systems. Boards are not just governance structures, they are liability surfaces. The reason these roles persist (even as AI can replicate parts of their analytical function) is that someone must stand in front of regulators, courts, and the public.

Executives will particularly dislike accountability when AI is making the calls. They will find ways to push that accountability down, and/or disperse it. This pattern is called the “moral crumple zone.” The term comes from research by Madeline Clare Elish, who describes how responsibility in complex automated systems gets assigned to a human operator, even when that person has limited control over the outcome.

Just as the crumple zone in a car is designed to absorb the force of impact, the human in a highly complex and automated system may become simply a component (accidentally or intentionally) that bears the brunt of the moral and legal responsibilities when the overall system malfunctions to protect the company and senior executives. Some bluntly refer to them as "meat shields".

“Originating in gaming, a meat shield is a term for using a person or creature to absorb damage and protect more valuable assets. It refers to using low-cost or fast-producing "units" to protect high value targets.

Here are concrete examples across domains where accountability is explicitly or implicitly shifted to humans; even when their control is marginal. The human is repositioned as the point of failure. They are the "crumple zone" or "meat shield" for the AI model. The more autonomous the system becomes, the more this effect intensifies.

Healthcare: Clinical Decision Support
- Example: Radiology AI flags or misses a tumor. The radiologist is still legally responsible for the diagnosis.
- Accountability Shift: The AI is framed as “advisory.” The physician is the “final decision-maker,” even when alert volume or opacity makes independent validation unrealistic.
Finance: Algorithmic Risk & Credit Decisions
- Example: Loan officers approve AI-scored applications. Fraud detection systems flag transactions; analysts clear or block them. If discrimination occurs (e.g., disparate impact) the approving human bears responsibility.
- Accountability Shift: Humans are expected to “override” models they don’t fully understand, under time pressure, with asymmetric consequences.
Hiring: Resume Screening & Candidate Ranking
- Example: AI filters and/or ranks candidates. Recruiters are told to “use judgment,” but are often evaluated on throughput. If bias is discovered, the recruiter or hiring manager is accountable for the decision.
- Accountability Shift: The human is downstream of a pre-filtered pool. The system has already shaped the decision space.
Insurance: Claims Triage & Denial Systems
- Example: AI models flag claims as fraudulent or low priority. Adjusters “review” but at scale. If wrongful denial occurs, the human reviewer is the accountable actor.
- Accountability Shift: Humans are expected to approve or override models they don’t fully understand, under time pressure, with asymmetric consequences.
Military: Human-in-the-Loop Targeting Systems
- Example: Targeting systems surface candidates based on sensor data and models. A human must approve engagement. Legal and ethical accountability sits with the human operator.
- Accountability Shift: Decisions have limited time windows; humans have limited or incomplete information. Time pressure makes full review impractical, turning oversight into formality.
Autonomous Vehicles: Human Oversight of Autonomous Driving
- Example: Drivers are expected to monitor and intervene. Reaction times required may exceed realistic human capabilities. After incidents, responsibility frequently defaults to the driver.
- Accountability Shift: Oversight is reactive, not proactive. Yet humans are poor at passive supervision, especially over long durations. If the human fails to react decisively, the vehicle crashes.

Emerging Role: The Human Accountability Sink

There is a tendency to frame AI as a way to remove humans from decision-making. In practice it redistributes human responsibility. This creates a new role inside organizations: the owner of consequences, e.g. "meat shield".

Expect to see explictly defined accountable roles for AI:

AI Accountability Officer
Model Risk Officer
Machine Learning Auditor
AI System Owner
Etc.

Even with these roles some risk will remain implicit. Contractors. Vendors. People close enough to the system to be blamed, but not powerful enough to prevent failure. It may be intentionally convenient for a company to have third-party subcontractors who can be thrown under the bus when the system misbehaves.

The risk is not just that AI systems fail. The risk is that we misattribute why they failed, and who had the power to prevent it. In some case a human is accountable for actions they didn’t authorize and may not even be able to observe. In other cases, the model can be exploited in ways the “accountable human” cannot reasonably prevent. This has significant consequences:

Workers take on legal and reputational risk without corresponding authority
Companies optimize for plausible deniability rather than system integrity
Regulators target individuals because systems are too complex to prosecute cleanly

If we get this wrong the closest humans to an AI system become the default scapegoats.

The Uncomfortable Truth

“Human-in-the-loop” often means “human-as-liability-sink.” The system keeps the upside (scale, speed, cost), while the accountable human absorbs downside risk.

Organizations need humans that can absorb blame. That is not a bug. It is a structural requirement of our legal and social systems. The open question is whether we plan and design for it. We can make accountability explicit. Tie responsibility to real authority. Build systems that expose decision paths and make ownership clear.

A more likely path is as agentic AI spreads, expect model decision-making to shift into systems that no single person oversees, fully understands or controls. Oversight will shift to committees who approve the model prompt(s), or the controls/guardrails around model decisions. Committees will hire third party consultants to review controls and oversight. Accountability will be diffused and amorphous.

Either way, humans are already serving as a "crumple zone" or “meat shield” protecting executives and AI models. I hope we can plan and design for human accountability to align with AI model controls and oversight.

References

Sharing is Caring

Edit this page