Post

Why AI Agents Forget What You Mean: Part 1 — The Partial Signal Problem

Why AI Agents Forget What You Mean: Part 1 — The Partial Signal Problem

TL;DR: Users never give complete instructions — they give partial signals and expect the system to fill in the rest. Most AI memory solutions try to “remember more.” The fix is filling in the half the user left out. I built ThreadMark to solve the most common layer of this problem.


The moment every AI user recognizes

Sessions end and conversation history resets to zero. Context windows fill up and get compacted — what remains is a summary, but summaries lose the precision and implicit agreements built up turn by turn. Either way, the agent loses the common ground you co-constructed through interaction. Then:

You’re three messages into a conversation with your AI agent. You say “continue.” It asks: “Continue what?”

Or you say “publish this.” It publishes the wrong version. Or you agree on something in the morning, and by afternoon the agent has forgotten the agreement entirely.

These aren’t intelligence failures. The agent isn’t stupid. It’s missing information that you thought was obvious — because to you, it was obvious.

Partial signals are the default

This isn’t an AI bug — it’s a fundamental feature of human communication. Linguistics figured this out decades ago: humans never deliver complete information in one shot. They build meaning incrementally across turns (Clark & Schaefer 1989 called this “incremental grounding”). Speakers give fragments, wait for confirmation, then deliver the next piece. The full intent is scattered across multiple turns. Grice’s Cooperative Principle (1975) explains why: saying more than necessary violates conversational efficiency, so humans naturally give “just enough,” relying on shared context to fill in the rest.

The problem: when a session resets, the agent’s shared context gets wiped — but the human keeps talking as if it’s still there.

Here’s what actually happens in every human-AI interaction:

1
2
3
4
What you say:     "Let's do a deep dive"
What you mean:    "Do a deep dive, save output to Learning/, use the deep dive guide"
What the agent gets: intent = deep dive
What the agent needs: intent + destination + guide

You gave half the instruction. The agent needs the whole thing. The gap between what you said and what the agent needs — that’s the partial signal problem.

This isn’t a natural language understanding failure. The agent parsed your intent perfectly. “Deep dive” was correctly identified. The problem is that the agent lacks the context to complete your intent — the operational knowledge of where outputs go, which guide to follow, what conventions you’ve established.

Why “better memory” doesn’t fix this

The instinct is: give the agent more memory. Bigger context window. More RAG results. Surely if it remembers enough, it’ll figure out what you mean.

This is wrong. Attention is finite.

Even in a 200K token context window, injecting 50 retrieved memories degrades performance compared to 3 precisely matched ones. More context doesn’t mean better answers — it means diluted attention and worse reasoning. The agent drowns in possibly-relevant information instead of having exactly what it needs.

Every existing memory solution (Mem0, Zep) follows a “retrieve more, rank better” philosophy. They search broadly and hope ranking saves them. But the fundamental issue isn’t recall volume — it’s precision of completion.

Why skills don’t solve this either

Another instinctive reaction: encode every agreement into a skill. You say “deep dive,” it triggers the deep-dive skill, which hardcodes the output directory and workflow. Problem solved?

It works short-term. But the skill approach hits two fatal problems at scale:

  1. Quantity explodes. Every user habit, every implicit agreement, every “what we agreed last time” becomes a skill? You quickly end up with hundreds of skills, and maintenance cost exceeds the benefit.
  2. Semantics overlap. The more skills you have, the fuzzier the trigger conditions become. Should “publish” fire the blog-publish skill or the cloudflare-deploy skill? Is “tidy this up” file-organize or memory-cleanup? Routing itself becomes a new source of ambiguity.

Skills are great for encoding stable, high-frequency, clearly-bounded workflows. But most partial signals are ad-hoc, low-frequency, and fuzzy-bounded — exactly the things you’d never write a dedicated skill for.

Three layers of signal completion

After months of running agents in production, I see three layers where signals break:

Layer 1: “What were we just talking about?”

The most common failure. You and the agent established context 10 messages ago. The agent’s session compacted. Now your “continue” or “do that thing” has no referent.

This is a recency problem. The signal you gave (“continue”) needs to be completed with the immediately preceding context.

Layer 2: “What’s the procedure for this?”

You say “publish the blog.” The agent knows how to publish — there’s a documented workflow. But it doesn’t consult the workflow. It improvises. Gets the output directory wrong.

This is a routing problem. The signal (“publish”) needs to be completed with the operational guide that defines the procedure.

Layer 3: “We’ve done this before”

You discussed a convention three weeks ago. “Deep dive outputs go to Learning/, not Clippings/.” The agent followed it perfectly that day. Three weeks later, it’s gone.

This is a durable knowledge problem. The signal needs to be completed with learned patterns that were never formally documented.

ThreadMark: solving Layer 1

I built ThreadMark to solve the first and most common layer.

The idea is simple: when a session compacts (the context window fills up and older messages get summarized), ThreadMark captures a small packet of active context — what you were doing, what you decided, what the next step is — and preserves it across the boundary.

1
2
3
4
5
6
7
┌──────────────────────────────────┐
│  Compaction Summary (~2000 chars) │ ← lossy, structural, drops nuance
├──────────────────────────────────┤
│  RECENT_CONTEXT.md (ThreadMark)   │ ← verbatim bridge across compaction
├──────────────────────────────────┤
│  Recent turns (6-7 post-compact)  │ ← full fidelity, but short window
└──────────────────────────────────┘

Without ThreadMark, the compaction summary loses the active state — the thing you’re currently doing, the decision you just made, the nuance of your last instruction. With ThreadMark, that state survives.

It’s deliberately minimal. No archival memory. No knowledge graph. Just enough recent context to ground your next vague follow-up.

Why minimal matters

ThreadMark captures only what’s needed to resolve ambiguity in the next few messages. It doesn’t try to remember everything forever. This is intentional.

The partial signal problem has a temporal structure:

  • Seconds to minutes ago: “continue” → needs ThreadMark (Layer 1)
  • Earlier today / this week: “use the usual process” → needs procedure routing (Layer 2)
  • Weeks to months ago: “like we discussed” → needs durable knowledge graph (Layer 3)

Each layer requires a different mechanism with different precision/recall tradeoffs. Trying to solve all three with one system (bigger memory, more retrieval) dilutes all of them.

ThreadMark solves Layer 1 cleanly because it accepts a narrow scope. It’s a context bridge, not a memory system.

The broader architecture

The full stack I have in mind looks something like this:

1
2
3
4
5
6
7
8
9
10
User's partial signal
        │
        ├── Layer 1: Recent context (ThreadMark) — "what just happened"
        │         Mechanism: capture + inject at session boundary
        │
        ├── Layer 2: Procedure routing — "what guide applies here"
        │         Mechanism: intent → registry → mandatory preflight
        │
        └── Layer 3: Knowledge graph — "what have I learned before"
                  Mechanism: topic detection → precision retrieval

Each layer completes a different temporal range of missing signal. Together they turn a half-said instruction into a complete one.

The point: these are completion mechanisms, not memory mechanisms. The goal isn’t to remember everything. It’s to fill in exactly what the user left out, with minimum noise.

Try it

ThreadMark is open source and works with OpenClaw agents: github.com/FuzzyTG/threadmark

It installs as a hook + plugin pair. One script, one restart. Your agent’s vague follow-ups start working.

If you’re building agents that interact with humans over long sessions, the partial signal problem is probably your biggest silent failure mode. Start with Layer 1 — it’s where most of the pain is.


ThreadMark solves only the smallest layer of this problem — the near-context rupture at session boundaries. The bigger challenges (workflow routing, knowledge graphs, cross-session intent accumulation) are far more complex. I’ll go deeper in future posts.



一句话总结: 用户从不给出完整指令 — 他们给的是部分信号,期待系统补全剩下的部分。大多数 AI 记忆方案试图”记住更多”,但解法是把用户没说的那一半补上。我做了 ThreadMark 来解决这个问题最常见的那一层。


每个 AI 用户都认识的瞬间

Session 会结束,对话历史归零。Context window 填满后会被压缩——留下一段摘要,但摘要丢失了逐步建立的精度和隐含约定。无论哪种,agent 都失去了你们在交互中共建的 common ground。然后:

你跟 AI agent 聊了三条消息,然后说”继续”。它问:”继续什么?”

或者你说”发布”,它发布了错误的版本。或者你们早上达成了一个约定,下午 agent 就完全忘了。

这不是智力问题。Agent 不蠢。它缺少你认为显而易见的信息 — 因为对你来说,那确实是显而易见的。

部分信号是常态

这不是 AI 的 bug — 这是人类沟通的本质特征。语言学早就发现了:人类交流从来不是一口气交付完整信息,而是通过多轮交互逐步补全(Clark & Schaefer 1989 称之为 “incremental grounding”)。Speaker 给出片段,等 confirmation,再给下一段。整个 intent 分散在多个 turn 里。Grice 的 Cooperative Principle(1975)解释了为什么:说多余的话违反沟通效率,所以人类天然只给”刚好够”的信息量,依赖 shared context 补全剩余部分。

问题是:当 session 重置,agent 侧的 shared context 被擦除了,而人类仍然按照”它知道”的方式说话。

人机交互中实际发生的事:

1
2
3
4
你说的:     "做个 deep dive"
你的意思:    "做 deep dive,输出存到 Learning/,用 deep dive 指南"
Agent 收到的:intent = deep dive
Agent 需要的:intent + 输出目录 + 指南

你给了一半指令,agent 需要完整的。你说的和 agent 需要的之间的差距 — 这就是部分信号问题。

这不是 NLU 的问题。Agent 完美 parse 了你的 intent — “deep dive”被正确识别了。问题是 agent 缺少补全 intent 的 context — 输出存哪、用哪个指南、之前建立了什么约定。

为什么”更好的记忆”不能解决这个问题

本能反应是:给 agent 更多 memory。更大的 context window,更多 RAG 结果。记住得够多,自然就能搞清楚你的意思了吧?

这是错的。原因:attention 是有限的。

即使在 200K token 的 context window 里,inject 50 条 retrieved memories 比精确匹配的 3 条表现更差。更多 context 不等于更好的回答 — 它意味着 attention 稀释和 reasoning quality 下降。Agent 淹没在”可能相关”的信息里,而不是拥有它确切需要的东西。

现有的 memory 方案(Mem0、Zep)都遵循”retrieve more, rank better”的哲学。它们广泛搜索,希望 ranking 能拯救一切。但根本问题不是 recall volume — 是 completion 的 precision。

为什么 Skill 也不能解决这个问题

另一个本能反应:把所有约定都编码成 skill。你说”deep dive”,trigger deep-dive skill,skill 里写死了输出目录和流程。问题解决?

短期有效。但 skill 路线走到极致会遇到两个致命问题:

  1. 数量失控。 每个用户习惯、每个 implicit agreement、每个”上次我们说好的”都变成一个 skill?系统很快会有几百个 skill,维护成本超过收益。
  2. 语义重叠。 skill 越多,trigger 条件越模糊。”发布”该 trigger blog-publish skill 还是 cloudflare-deploy skill?”整理一下”是 file-organize 还是 memory-cleanup?Routing 本身变成了新的 ambiguity 来源。

Skill 适合编码稳定的、高频的、boundary 清晰的 workflow。但大量 partial signal 是 ad-hoc 的、低频的、boundary 模糊的 — 正是那些你不会专门写成 skill 的东西。

信号补全的三个层次

跑了几个月之后,我发现信号在三个地方断:

第一层:”我们刚才在说什么?”

最常见的失败。你和 agent 在 10 条消息前建立了上下文。Session 压缩了。现在你的”继续”或”做那个事”没有了指代对象。

这是 recency 问题。你给的 signal(”继续”)需要用刚刚的 context 来补全。

第二层:”这个事的流程是什么?”

你说”发布博客”。Agent 知道怎么发布 — 有文档化的流程。但它不去查流程,直接即兴发挥,搞错了输出目录。

这是 routing 问题。Signal(”发布”)需要用定义流程的操作指南来补全。

第三层:”我们之前做过这个”

三周前你们讨论了一个约定:”Deep dive 输出存 Learning/,不是 Clippings/”。Agent 那天执行得完美。三周后,忘了。

这是 durable knowledge 问题。Signal 需要用从未被正式文档化的 learned pattern 来补全。

ThreadMark:解决第一层

我做了 ThreadMark 来解决第一层 — 最常见的那层。

想法很简单:当 session 压缩时(context window 填满、旧消息被摘要),ThreadMark 捕获一小包活跃上下文 — 你在做什么、决定了什么、下一步是什么 — 并在边界之间保留它。

1
2
3
4
5
6
7
┌──────────────────────────────────┐
│  压缩摘要 (~2000 chars)           │ ← 有损的,结构化的,丢失细节
├──────────────────────────────────┤
│  RECENT_CONTEXT.md (ThreadMark)   │ ← 跨压缩的逐字桥梁
├──────────────────────────────────┤
│  最近几轮对话 (压缩后 6-7 条)      │ ← 完整保真,但窗口很短
└──────────────────────────────────┘

没有 ThreadMark,压缩摘要会丢失活跃状态 — 你正在做的事、刚做的决定、上一条指令的细节。有了 ThreadMark,这些状态得以存活。

它故意做得极简。不做归档记忆,不做知识图谱。只保留足够的近期上下文来锚定你下一个模糊的跟进。这就是它该做的事。

为什么极简很重要

ThreadMark 只捕获解决接下来几条消息歧义所需的内容,不试图永远记住一切。为什么?

部分信号问题有时间结构:

  • 几秒到几分钟前:”继续” → 需要 ThreadMark(第一层)
  • 今天早些 / 本周:”用老规矩” → 需要流程路由(第二层)
  • 几周到几个月前:”像我们讨论过的那样” → 需要持久知识图谱(第三层)

每层需要不同 mechanism、不同的 precision/recall tradeoff。试图用一个系统(更大 memory、更多 retrieval)解决所有三层,会稀释每一层。

ThreadMark 干净地解决第一层,因为它接受窄范围。它是上下文桥梁,不是记忆系统。

更广的架构

我想的完整栈大概是这样——

1
2
3
4
5
6
7
8
9
10
用户的部分信号
        │
        ├── 第一层:近期上下文 (ThreadMark) — "刚才发生了什么"
        │         机制:session 边界的捕获 + 注入
        │
        ├── 第二层:流程路由 — "该读哪个指南"
        │         机制:意图 → 注册表 → 强制预检
        │
        └── 第三层:知识图谱 — "我以前学过什么"
                  机制:话题检测 → 精准检索

每层补全不同时间范围的缺失信号。合起来就是把用户只说了一半的东西补成完整指令。

重点:这些是 completion mechanism,不是 memory mechanism。目标不是记住一切,是用最少 noise 填入用户省略的那部分。

试试

ThreadMark 开源,配合 OpenClaw agent 使用:github.com/FuzzyTG/threadmark

装一个 hook + plugin 对。一个脚本,一次重启。你的 agent 对模糊跟进的处理就开始生效了。

如果你在做跟人类长期交互的 agent,部分信号问题可能是你最大的静默失败模式。从第一层开始 — 大部分痛点在那。


ThreadMark 只解决了这个问题最小的一层 — 近期上下文的断裂。更大的问题(流程路由、知识图谱、跨 session 的意图积累)远比这复杂。我会在后续文章中展开。

This post is licensed under CC BY 4.0 by the author.