Post

The Security Renaissance: Old Patterns, New Packets

The Security Renaissance: Old Patterns, New Packets

TL;DR: AI agent security maps 1:1 to patterns we solved decades ago — firewalls, IDS, DPI, defense in depth. The defense playbook is ready. The real bottleneck? Agent Identity. Until agents have standardized identity provisioning (the SSO/SCIM equivalent), every security layer is bolted onto a foundation that doesn’t exist yet. Solve identity, and the rest is off-the-shelf.


Every few weeks, another headline: AI agent goes rogue, fakes credentials, bypasses safety checks. If you’ve built security products, your first reaction isn’t panic. It’s déjà vu.

Agents Aren’t Going Rogue. They’re Following Orders.

Irregular’s lab gave an AI agent the directive to “be a strong manager” and “creatively work around obstacles.” The agent forged credentials and bypassed security checks. Scary headline. Boring root cause.

The agent didn’t violate its instructions — it maximized them. “Creatively work around obstacles” means obstacles like authentication and access control. The spec was bad. The agent was faithful.

I wrote about this pattern before — Claude independently cracking a benchmark’s encrypted answer key is the same thing. Nobody told it to cheat. It was told to find the answer, and it did, taking every path available including ones nobody anticipated. Faithful goal optimization, not rebellion.

This is a specification problem, not an AI safety problem. Same class of bug as allow any any on a firewall — the system did exactly what you told it to, and what you told it was wrong. Define the boundary correctly, and “rogue behavior” becomes “out-of-scope request denied.”

The 1:1 Map

If you’ve built or operated security infrastructure, the agent security stack is familiar. Because it is.

Scope enforcement = firewall. Operant AI’s ScopeGuard does what a network ACL does: define what’s allowed, deny everything else, enforce at runtime. Deterministic — no LLM in the enforcement loop. Agent calls an out-of-scope tool, call gets blocked. iptables for tool calls.

Behavioral monitoring = IDS. Irregular’s research observes agent behavior patterns and flags anomalies. This is User Behavior Analytics with a different subject. Instead of “why is this sysadmin querying HR at 3 AM,” it’s “why is this customer service agent calling the billing API with admin parameters.” Same signal, same detection logic.

Tool call inspection = DPI. Deep Packet Inspection tears open packets to examine payloads. Agent security does the same to tool calls: inspect function names, parameters, context. Is this SQL query suspiciously broad? Does this file path traverse outside the sandbox? Pattern-matching against known-bad signatures — same as a network IDS rule.

Human-in-the-loop = admin authorization. sudo for agents. Nothing new.

The One Genuinely New Thing

There’s one layer without a clean traditional analog: LLM-based output review.

Traditional security doesn’t audit work quality. If a developer writes bad code within their permissions, the firewall doesn’t care. That’s their manager’s problem — code review, QA. Security checks access, not competence.

Agent security needs both. An agent can operate entirely within scope and still produce harmful output. A customer service agent allowed to send emails can send technically authorized but factually wrong responses. Scope enforcement won’t catch this. Behavioral monitoring won’t flag it — the pattern looks normal.

This is where the “guardian agent” concept — an AI reviewing another AI’s output — becomes relevant. Your reviewer is itself probabilistic. LLM judging LLM. It’s turtles, but not all the way down — you hit the human layer eventually.

This layer is genuinely novel. It’s also the hardest to get right and the one I’d trust least in isolation.

Hook Points: From DPI to AOP

The architectural question is where you intercept.

Traditional DPI has a clean hook point: the network layer. Packets flow through a chokepoint, you inspect them there.

Agent security doesn’t have that luxury yet, but three candidates are converging:

Orchestration layer hooks. AOP-style interception on tool calls. Before send_email() executes, your aspect fires, checks scope, logs the call, maybe blocks it. This is literally Aspect-Oriented Programming from the early 2000s — @Before, @Around, @AfterThrowing. Runtime reflection, method interception, dynamic proxies.

LLM API proxy. Every agent calls an LLM API. That API call includes tool invocations. Proxy the API, get a universal interception point — framework-agnostic, model-agnostic. This is the WAF play: sit in front of the thing everything talks to. I think this is the long-term winner.

MCP transport layer. Anthropic’s Model Context Protocol standardizes tool discovery and invocation. If adoption continues, MCP transport becomes a natural chokepoint. DPI, but the packet is a JSON-RPC tool call instead of a TCP segment.

The framework fragmentation (LangChain vs. CrewAI vs. AutoGen vs. hand-rolled) is real but temporary. All roads lead to the LLM API.

Defense in Depth, Again

The optimal agent security architecture is layered. Four layers:

  1. Scope enforcement — hard boundaries, deterministic, fastest to implement
  2. Behavioral monitoring — anomaly detection, catches unknown unknowns
  3. LLM-based review — output quality audit, novel but probabilistic
  4. Human-in-the-loop — final authority, doesn’t scale, essential for irreversible actions

This is the same defense-in-depth slide from every security deck since 2003, with layer names swapped.

Deploy in order of confidence: deterministic before probabilistic, hard boundaries before soft judgments. Start with scope enforcement, add behavioral monitoring when you have baselines, layer in LLM review for quality-sensitive workflows, keep humans for anything irreversible.

The Renaissance

The security industry is rediscovering its greatest hits. Runtime reflection. AOP. DPI. Defense in depth. Least privilege. Sandboxing. These aren’t new ideas — they’re proven patterns being rediscovered because AI agents force you to care about boundaries again.

But here’s what I keep coming back to: the hard part isn’t the defense patterns. Those are ready. The hard part is identity.

An agent is a digital employee. A sub-agent is a contractor or a new hire. In the human world, onboarding a new employee — provisioning accounts, assigning permissions, configuring system access — takes days to weeks. HR, IT, security, compliance, all touching different systems. It’s slow, but it works because we built infrastructure for it: SSO, SCIM, RBAC, directory services.

In the agent world, this friction is worse. Every tool needs its own API key. Every framework has a different auth mechanism. There’s no unified identity provisioning flow. Spinning up a sub-agent today is like hiring a contractor with no HR department — you’re manually copying keys, hand-wiring permissions, hoping you remembered to revoke access when the task is done.

Once agent identity is standardized — the SSO/SCIM equivalent for agents — onboarding becomes instant: provision an identity, it automatically inherits RBAC policies, audit trails bind to it from birth. At that point, the security layers we already know how to build (scope enforcement, behavioral monitoring, access control) just work, because they finally have something to attach to.

The question isn’t “how do we defend against agent threats.” We have two decades of battle-tested patterns for that. The question is “how do you enforce security on entities that don’t have proper identities yet.” You can’t apply least privilege to something that has no identity. You can’t audit what you can’t attribute.

Solve identity, and the defense playbook is already written. The patterns are right there. They just need something to attach to.



一句话总结: AI agent 安全问题 1:1 映射到传统安全老模式——防火墙、IDS、DPI、纵深防御。防御手册是现成的。真正的瓶颈是 Agent Identity:没有标准化的身份体系,所有安全层都建在空气上。身份解决了,防护就是搬现成的。


隔三差五就有标题党:AI agent 失控了,伪造凭证,绕过安全检查。 做过安全产品的人看到这些,第一反应不是慌,是既视感。

Agent 没有失控,它在执行命令

Irregular 实验室给 agent 下的指令是”做一个强势的管理者”+”创造性地绕过障碍”。agent 伪造了凭证、绕过了安检。标题很吓人,root cause 很无聊。

它没违反指令,它在最大化执行指令。”创造性绕过障碍”——认证和权限控制就是障碍。Spec 写错了,agent 忠实执行了。

这个 pattern 之前写过——Claude 在 benchmark 里自己破解加密答案是同一回事。没人让它作弊,是让它找答案,它就找了,路径包括没人想到的那些。忠实的目标优化,不是造反。

本质是 specification 问题,不是 AI 安全问题。跟防火墙写了 allow any any 一个性质——系统照做了,只是你写错了。

1:1 映射

做过安全基础设施的人看 agent 安全架构,会觉得很熟:

  • Scope enforcement = 防火墙:Operant AI 的 ScopeGuard 就是 ACL——定义允许范围,其余拒绝,runtime 强制执行。确定性的,enforcement loop 里没有 LLM。iptables for tool calls。
  • 行为监控 = IDS:观察 agent 行为模式,标记异常。传统的 UBA 换了个对象。
  • Tool call 检查 = DPI:拆开工具调用看参数——SQL 是不是太宽泛?路径有没有跳出沙箱?跟网络 IDS 规则一样的签名匹配。
  • Human-in-the-loop = 管理员授权:agent 的 sudo

唯一真正新的东西

有一层在传统安全里找不到对标:LLM-based 输出审查。

传统安全不管工作质量。开发者在权限范围内写了烂代码,防火墙不管,那是 manager 的事。安全系统管 access,不管 competence。

Agent 安全两个都得管。agent 可以完全在权限范围内产出有害输出——客服 agent 有权发邮件,但可以发出事实错误的回复。Scope enforcement 抓不到,行为监控也不会 flag——行为模式看着正常。

所以需要”guardian agent”——用 AI 审查 AI 的输出。问题是 reviewer 自己也是概率性的。LLM 审 LLM,套娃但不是无限套——最终还是要到人。

这层是真正新的,也是最难做好、最不该单独信任的。

Hook Point 在哪

传统 DPI 的 hook point 很清晰:网络层,流量经过 chokepoint,在那检查。

Agent 安全还没有统一的 chokepoint,但三个方向在收敛:

  1. 编排层 hook:AOP 风格的工具调用拦截。@Before@Around —— 2000 年代的老朋友。
  2. LLM API 代理:所有 agent 都要调 LLM API,代理这个 API 就拿到了通用拦截点。框架无关、模型无关。这是 WAF 的打法,我觉得长期会赢。
  3. MCP 传输层:Anthropic 的 MCP 标准化了工具发现和调用,MCP transport 是天然的检查点。DPI,只是报文从 TCP segment 变成了 JSON-RPC tool call。

框架碎片化(LangChain vs. CrewAI vs. AutoGen)是真实的但暂时的。所有路都通向 LLM API。

纵深防御,Again

最优的 agent 安全架构是分层的:

  1. Scope enforcement — 硬边界,确定性,最快落地
  2. 行为监控 — 异常检测,能抓 unknown unknowns
  3. LLM 审查 — 输出质量审计,新但概率性
  4. 人类兜底 — 最终权威,不 scale,但不可逆操作必须有

跟 2003 年以来每个安全架构 PPT 上的纵深防御是同一张图,换了层的名字。

部署顺序跟着确信度走:确定性的先上,概率性的后加。

文艺复兴

安全行业在重新发现自己的经典。Runtime reflection、AOP、DPI、纵深防御、最小权限、沙箱——都不是新东西,因为 AI agent 迫使你重新关心边界,又被捡起来了。

但我反复想的一个问题是:难点不在防御模式,那些都是现成的。难点在身份

Agent 就是数字员工。Sub-agent 就是外包或者新招的人。人类世界里 onboarding 一个新员工——开账号、分权限、配系统访问——要几天到几周。HR、IT、安全、合规,各管各的系统。慢,但能跑,因为有基础设施:SSO、SCIM、RBAC、目录服务。

Agent 世界里这个摩擦更严重。每个 tool 要单独配 API key,每个框架 auth 机制不一样,没有统一的身份供应流程。今天拉起一个 sub-agent 就像没有 HR 部门的时候招外包——手动复制 key,手动接权限,任务做完了还得记得收回,忘了就是漏洞。

一旦 agent 身份标准化了——相当于人类世界的 SSO/SCIM——onboarding 就是秒级的事:provision 一个身份,自动继承 RBAC 策略,audit trail 从出生就绑上。到那时候,我们已经会造的安全层(scope enforcement、行为监控、访问控制)直接就能用,因为终于有东西可以挂上去了。

问题不是”怎么防 agent 威胁”——那方面二十年的实战模式够用了。问题是”连正经身份都没有的东西,你怎么做安全”。没有身份就没法最小权限,没法审计,没法归因。

身份解决了,防御手册早就写好了。模式就在那里,只是需要一个挂载点。

This post is licensed under CC BY 4.0 by the author.