Post

Why We Review Better on a Page Than in a Chat

Why We Review Better on a Page Than in a Chat

Chat-based document editing feels exhausting because it fights how your brain works. You lose spatial memory, overload working memory, and constantly switch between evaluating and instructing. The fix: review on a full page, not in a chat.

I built doc-review-skill for exactly this — an open-source tool that turns any Markdown into a password-protected review page with inline annotations. Deploys to Cloudflare Pages, free, no login needed. AI generates the draft, you review it like a normal document, done.

If that’s all you needed, take it and go. Below is the cognitive science behind why this works.


When reviewing a document — article, report, analysis — I instinctively prefer:

  • Open the full page → read through → leave comments → send back for revision

Over:

  • Chat with an AI agent, go section by section, fix as we go

Both get the job done. But the first way feels right and the second feels exhausting. Why?

1. Spatial Memory vs. Temporal Memory

A document is a spatial object. Your brain builds a cognitive map of it — “that issue was in the third paragraph, lower left.” Research suggests we anchor text memory to physical location on the page. Cataldo & Oakhill (2000) studied how text representation and spatial memory affect the ability to locate information — finding that skilled comprehenders leverage spatial cues in text to relocate content efficiently, while poor comprehenders struggle precisely because they fail to build this spatial map.

“An investigation into the effects of text representation and spatial memory on the ability to locate information in text.” — Cataldo & Oakhill (2000), Journal of Educational Psychology, 92(1).

The original study focused on reading comprehension differences, not page-vs-chat formats directly — but the underlying mechanism (spatial anchoring aids retrieval) transfers well.

A conversation is temporal. Information is ordered by time, not space. Recalling “what did we discuss in round 4?” is much harder than “what was written in that section near the top?”

A full page gives you a map. A chat gives you a timeline. Maps are easier to navigate.

2. Cognitive Offloading

Working memory holds ~7±2 chunks (Miller’s Law).

When you review a full page, the document itself acts as external memory. You only need to hold your current judgment in your head — everything else is right there on screen.

In a chat, you’re holding: the current section + what you already fixed + whether the overall structure still holds + what’s coming next. That’s easily 4-5 parallel tracks, and they compete for the same limited slots.

Sweller’s Cognitive Load Theory (1988; refined in 1994) calls this extraneous load — cognitive cost created by the format, not the task. When the presentation format forces extra processing unrelated to the actual evaluation task, that’s wasted cognitive bandwidth.

“Extraneous cognitive load is the load resulting from the design of the instruction or interface, not from the inherent complexity of the content.” — Sweller, J. (1994), Learning and Instruction, 4(4).

3. Batch Processing vs. Context Switching

Reviewing is a System 2 analytical task (Kahneman). It needs sustained attention.

Full-page review = batch mode. You enter a stable “evaluator” mindset and maintain it from start to finish.

Chat-based fixing = constant interruption. Each round forces: evaluate → instruct → wait → verify → reload context. Mark et al. (2008) found that after interruptions, it takes an average of 23 minutes and 15 seconds to return to the original task. But the more striking finding was this: interrupted workers actually completed tasks faster — by compensating with a more stressful, hurried working style.

“When people are constantly interrupted, they develop a mode of working faster (and writing less) to compensate for the time they know they will lose by being interrupted. Yet working faster with interruptions has its cost: people in the interrupted conditions experienced a higher workload, more stress, higher frustration, more time pressure, and effort. So interrupted work may be done faster, but at a price.” — Mark, Gudith & Klocke (2008), CHI ‘08.

Even at the smaller scale of chat-based review, the context reload has a fixed cost every single time — and the stress accumulates.

4. Evaluation Mode vs. Generation Mode

This might be the most important one — and it’s my own inference, not a directly tested finding.

Reviewing = evaluation (judging quality, spotting problems). Chat fixing = evaluation + generation, alternating (spot problem → figure out how to instruct the fix → verify the fix worked).

The general phenomenon of task-switching cost is well-established in cognitive psychology (e.g., Rogers & Monsell, 1995; Monsell, 2003): when you alternate between two different types of cognitive tasks, performance on both degrades due to residual interference from the previous task set. Applied to review: alternating between evaluating and generating instructions forces repeated mental gear-shifts. Full-page review keeps you purely in evaluation mode. You write all your comments, then switch to “fix instructions” once. One switch, not twenty.

This is exactly why code review best practice says: read all changes first, then leave comments — don’t review line by line.

5. Global Coherence

Document quality isn’t additive. It’s emergent — logic flow between sections, rhythm, repetition, contradiction. You can only judge these when you see the whole picture.

Chat-based section-by-section fixing optimizes local maxima but can break global coherence. You fix paragraph 3, but paragraph 7’s callback to it is now broken — and by the time you reach paragraph 7 in the chat, you’ve forgotten what paragraph 3 became.

The One-Liner

A full page lets you use spatial memory to offload working memory, stay in a single cognitive mode (pure evaluation), and maintain global awareness. Chat-based review forces temporal context reloading + mode switching + local optimization — higher cognitive cost, lower coherence.

So What?

This isn’t just about personal preference. It has design implications:

  • For AI tools: If you’re building an AI writing assistant, “chat-based iterative editing” sounds modern but fights human cognition. A better UX: generate full draft → present as a page → collect annotations → revise in one pass.
  • For review workflows: Separate the evaluation phase from the correction phase. Don’t interleave them.
  • For yourself: When you notice you’re “chat-editing” and getting frustrated, stop. Export the current version, read it as a document, make all your notes, then come back with a single round of instructions.

References

  • Cataldo, M. G., & Oakhill, J. (2000). Why are poor comprehenders inefficient searchers? Journal of Educational Psychology, 92(1), 791–799.
  • Miller, G. A. (1956). The magical number seven, plus or minus two. Psychological Review, 63(2), 81–97.
  • Sweller, J. (1988). Cognitive load during problem solving. Cognitive Science, 12(2), 257–285.
  • Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312.
  • Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
  • Mark, G., Gudith, D., & Klocke, U. (2008). The cost of interrupted work: More speed and stress. CHI ‘08, 107–110.
  • Rogers, R. D., & Monsell, S. (1995). Costs of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124(2), 207–231.
  • Monsell, S. (2003). Task switching. Trends in Cognitive Sciences, 7(3), 134–140.


一句话总结: 在聊天窗口里改文档让人心累,因为它跟你大脑的工作方式是拧着来的——丢了空间记忆、工作记忆被塞满、不停在评估和写指令之间切换。解法:在整页上审,不要在聊天里改。


我做了 doc-review-skill 就是干这个的——开源工具,把任何 Markdown 一键变成带密码保护的 review 页面,支持行内批注,跑在 Cloudflare Pages 上,免费,审稿人不需要登录。AI 生成草稿,你像看普通文档一样审,完事。

够用就拿走。下面是这套逻辑背后的认知科学。

审一篇文档——文章、报告、分析——我本能地倾向于:

  • 打开整个页面 → 通读 → 留批注 → 打回去改

而不是:

  • 在聊天窗口里跟 AI 一段一段过,边看边改

两种方式都能完成任务。但第一种感觉对,第二种让人心累。为什么?

1. 空间记忆 vs 时间记忆

文档是一个空间对象。你的大脑会给它建一张认知地图——”那个问题在第三段,页面偏左下”。Cataldo & Oakhill(2000)的研究发现,阅读能力强的人会利用文本的空间位置来定位信息,而阅读能力弱的人正是因为建不起这个空间地图才找不到东西。

原始研究的对象是阅读理解能力差异,不是直接研究”页面 vs 聊天”的。但底层机制——空间锚定帮助信息检索——是通用的。

聊天是时间序列。信息按时间排列,不按空间。”我们第四轮讨论了什么?”比”那个部分开头写了什么?”难回忆得多。

页面给你一张地图。聊天给你一条时间线。地图更好用。

2. 认知卸载

工作记忆大概能装 7±2 个信息块(Miller 定律)。

审整个页面的时候,文档本身就是外部记忆。你只需要在脑子里装当前的判断——其他内容都摊在屏幕上。

在聊天里改稿,你同时在装:当前这一段 + 之前改了什么 + 整体结构还对不对 + 下面还有什么要看。四五条并行线程,抢同一组有限的认知资源。

Sweller 的认知负荷理论(1988; 1994 细化)把这叫做 extraneous load——由呈现形式制造的认知成本,跟任务本身无关。当界面格式逼你做跟审稿无关的额外处理时,那就是在浪费认知带宽。

3. 批处理 vs 上下文切换

审稿是 System 2 的分析型任务(Kahneman),需要持续注意力。

全页审稿 = 批处理模式。进入”评估者”心态,从头到尾保持。

聊天改稿 = 不断打断。每一轮都要:评估 → 写修改指令 → 等 AI 改 → 验证结果 → 重新加载上下文。Mark 等人(2008)发现,被打断后平均需要 23 分 15 秒才能回到原任务。但更有意思的发现是:被打断的人反而更快完成了任务——代价是更高的压力、更多的挫败感、更大的认知负荷。

“当人们不断被打断,他们会发展出一种更快的工作模式来补偿被打断损失的时间。但更快的代价是:被打断条件下的受试者报告了更高的工作负荷、更多的压力、更高的挫败感、更大的时间压力和更多的努力。” — Mark, Gudith & Klocke (2008)

聊天改稿的尺度比实验里小,但上下文重载的固定成本每次都在,而且压力会累积。

4. 评估模式 vs 生成模式

这一点可能是最重要的——也是我自己的推断,不是直接经过验证的结论。

审稿 = 评估(判断质量,发现问题)。 聊天改稿 = 评估 + 生成,交替进行(发现问题 → 想怎么措辞修改指令 → 验证改得对不对)。

任务切换成本在认知心理学里有大量研究支撑(Rogers & Monsell, 1995; Monsell, 2003):在两种不同认知任务之间交替,两边的表现都会下降。放到审稿场景:评估和生成指令之间反复切换,每次都有心智换档成本。全页审稿让你纯粹待在评估模式。写完所有批注,再统一切到”修改指令”。一次切换,不是二十次。

这也是代码审查的最佳实践:先通读所有改动,再统一留评论。不要逐行边看边评。

5. 全局连贯性

文档的质量不是可加的,是涌现的——段落之间的逻辑流、节奏、重复、矛盾。只有看到全貌才能判断。

聊天里逐段修改优化的是局部最优,但可能破坏全局连贯性。你改了第三段,但第七段对第三段的呼应就断了——等你在聊天里翻到第七段的时候,你已经忘了第三段改成什么样了。

一句话总结

全页审稿让你用空间记忆卸载工作记忆,停留在单一认知模式(纯评估),保持全局视野。聊天改稿强制你做时间序列的上下文重载 + 模式切换 + 局部优化——认知成本更高,连贯性更差。

所以呢?

这不只是个人偏好。它有设计层面的意义:

  • 做 AI 工具的:「聊天式迭代编辑」听起来很现代,但跟人类认知是拧着来的。更好的 UX:生成完整草稿 → 渲染成页面 → 收集批注 → 一次性修订。
  • 做 review 流程的:把评估阶段和修正阶段分开。不要交错。
  • 给你自己:下次你在聊天窗口里边看边改、越改越烦的时候,停下来。把当前版本导出来,当文档读一遍,把所有意见写完,再回去一轮搞定。

参考文献

  • Cataldo, M. G., & Oakhill, J. (2000). Why are poor comprehenders inefficient searchers? Journal of Educational Psychology, 92(1), 791–799.
  • Miller, G. A. (1956). The magical number seven, plus or minus two. Psychological Review, 63(2), 81–97.
  • Sweller, J. (1988). Cognitive load during problem solving. Cognitive Science, 12(2), 257–285.
  • Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295–312.
  • Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
  • Mark, G., Gudith, D., & Klocke, U. (2008). The cost of interrupted work: More speed and stress. CHI ‘08, 107–110.
  • Rogers, R. D., & Monsell, S. (1995). Costs of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124(2), 207–231.
  • Monsell, S. (2003). Task switching. Trends in Cognitive Sciences, 7(3), 134–140.
This post is licensed under CC BY 4.0 by the author.