Articles: human-in-the-loop & AI agent safety
Practical, sourced writing on how to oversee AI agents — when a human in the loop helps, when it's just a rubber stamp, and how to design oversight that actually catches mistakes. Subscribe via RSS ↗
Start here & concepts 6
What Is Agentic AI?
Agentic AI explained: how AI agents plan and take actions with tools — what makes them powerful and risky, and why overseeing them means governing actions, not outputs.
Read →What Is Human-in-the-Loop (HITL) in AI?
Human-in-the-loop (HITL) means a person reviews or can intervene in an AI system's actions. A practical guide to HITL for AI agents — what it is, when it works, and when to prevent instead.
Read →Does Human-in-the-Loop Improve AI Safety?
Does keeping a human in the loop actually make AI agents safer? The evidence, when HITL helps, when it's false safety, and what real AI agent safety looks like.
Read →In-the-Loop vs On-the-Loop vs Out-of-the-Loop
Human-in-the-loop, human-on-the-loop, and out-of-the-loop explained: definitions, tradeoffs, the sudden-handoff problem, and how to choose oversight for AI agents.
Read →AI Agent Autonomy Levels (L0–L6)
AI agent autonomy levels explained: the L0–L6 ladder from silent autonomy to escalate-or-forbid, and how to pick the right level for each action by risk.
Read →Automation Bias: Why People Rubber-Stamp AI
Automation bias is why human-in-the-loop oversight of AI fails: people over-trust the system and approve without scrutiny. The evidence, and how to design against it.
Read →Patterns & controls 9
When Should an AI Agent Ask for Approval?
When AI agents should ask for human approval — and how to build approval gates that catch mistakes instead of becoming rubber stamps. Graded examples G0–G3.
Read →AI Agent Guardrails: A Practical Checklist
A practical AI agent guardrails checklist: sandboxing, least privilege, blast-radius caps, kill switches, circuit breakers, logging, and maker-checker — matched to risk.
Read →The Lethal Trifecta: How AI Agents Leak Data
The lethal trifecta — private data + untrusted content + an exfiltration channel — lets prompt injection steal data from AI agents. How it works and how to stop it.
Read →Prompt Injection Prevention
How to prevent prompt injection in AI agents: why filtering fails, and a defense-in-depth approach — least privilege, runtime shields, sandboxing, and removing a lethal-trifecta leg.
Read →Maker-Checker (Four-Eyes) for AI Agents
Maker-checker and the four-eyes principle for AI agents: why the proposer shouldn't be the approver, which actions need it, and how to implement it without rubber-stamping.
Read →How to Build an AI Kill Switch
What an AI kill switch is, why every agent needs one, and how to design one that stops everything in flight — fast, reachable by anyone, and blame-free.
Read →The Circuit Breaker Pattern for AI Agents
A circuit breaker auto-pauses an AI agent when error rate, spend, or volume crosses a threshold — and requires human re-authorization to resume. How to build one.
Read →AI Agent Sandboxing
What AI agent sandboxing is and why it beats per-action approval prompts: no-network containers, scoped credentials, resource caps, and disposable environments.
Read →Least Privilege for AI Agents
Least privilege for AI agents: give an agent only the tools, data, and credentials it needs — and why removing a capability beats forbidding its use.
Read →Use cases — human-in-the-loop for… 14
Human-in-the-Loop for AI Coding Agents
How to build human-in-the-loop oversight for AI coding agents: grade reads, edits, commits, merges, and shell actions G0–G3, and match the right control to each.
Read →Human-in-the-Loop for AI Customer Support
How to build human-in-the-loop oversight for AI customer support agents: value-conditional approval for refunds, review for outbound replies, and escalation done right.
Read →Human-in-the-Loop for AI Financial Transactions
How to build human-in-the-loop oversight for AI agents that move money: maker-checker, value thresholds, circuit breakers, and kill switches for irreversible payments.
Read →Human-in-the-Loop for AI Database Operations
How to build human-in-the-loop oversight for AI agents that run SQL: read-only by default, dry-runs, least privilege, backups, and maker-checker for prod schema changes.
Read →Human-in-the-Loop for AI Email & Messaging
How to build human-in-the-loop oversight for AI agents that send email and messages: undo-send windows, previews, rate caps, and approval for external or bulk sends.
Read →Human-in-the-Loop for AI Deployments
How to build human-in-the-loop oversight for AI-driven deployments: canary plus automatic rollback, circuit breakers, and a kill switch instead of a rubber-stamp approval.
Read →Human-in-the-Loop for AI Content Moderation
How to build human-in-the-loop oversight for AI content moderation: confidence-based routing, reversible removals, appeals as escalation, and avoiding reviewer fatigue.
Read →Human-in-the-Loop for Machine Learning
Human-in-the-loop machine learning explained: labeling, active learning, low-confidence review, and RLHF — how to route human effort by uncertainty and keep label quality high.
Read →Human-in-the-Loop for AI in Healthcare
How to design human-in-the-loop oversight for clinical AI: keep a licensed clinician in command, fight alert fatigue, and reserve autonomy for low-stakes actions.
Read →Human-in-the-Loop for AI Legal Work
How to design human-in-the-loop oversight for AI legal and contract work: verify citations, attorney sign-off, maker-checker for execution, and treating documents as untrusted.
Read →Human-in-the-Loop for AI Hiring
How to design human-in-the-loop oversight for AI hiring: keep a human deciding advance/reject, audit for bias, and never auto-reject candidates at scale.
Read →Human-in-the-Loop for Browser & Computer-Use Agents
How to design human-in-the-loop oversight for browser and computer-use agents: sandboxing, breaking the lethal trifecta, spend caps, and prompt-injection defense.
Read →Human-in-the-Loop for AI Voice Agents
How to design human-in-the-loop oversight for real-time AI voice agents: limit capabilities, verbal confirmation, and warm handoff to a human for high-stakes calls.
Read →Human-in-the-Loop for Multi-Agent Systems
How to design human-in-the-loop oversight for multi-agent systems: least privilege per sub-agent, provenance logging, one kill switch, and clear human accountability.
Read →