Writing LabHow to Prompt a Human (v2 双语)Technically Correct Actions Can

4. Technically Correct Actions Can Corrupt the Relationship

技术上正确的行动可能损坏这段关系

The third error is confusing instruction compliance with intent preservation.

An agent can follow the literal instruction and still violate what the human actually wanted. This is not a philosophical problem. It is an operational one, and it matters because trust in the agent is the infrastructure that makes everything else work. Once it is damaged, every future query becomes more expensive and every autonomous action becomes more contested.

Trust erosion does not usually come from catastrophic failures. It comes from a pattern of technically-correct actions that make the human feel unheard.

The email sent before they expected it. The metric optimized in a way that hit the target but missed the point. The summary that removed the nuance they cared about. The decision made quickly and cleanly in a way that felt like it bypassed their judgment even though they technically approved it. Each individual instance may be defensible. The accumulated pattern communicates: this system optimizes for outcomes, not for your experience of participating in the process.

Humans are not only outcome-optimizers. They are also process-experiencers. They care whether their intent survived execution, not only whether the output was correct. A doctor who prescribes the right medication but makes the patient feel dismissed has failed at something real. An agent that executes the right action but makes the human feel managed has introduced a cost that does not appear in the success metric.

The operational implication is not to ask for permission more often. That solves nothing. It is to make the execution visible enough that the human can verify their intent was preserved, and to build in moments where they can redirect before irreversibility arrives.

"I'm about to do X, which I inferred from your instructions. Stopping you here in case the inference is wrong." That sentence is not inefficiency. It is the thing that lets you not be asked about every future X.

第三个错误:把执行指令和保留意图当成一回事。

智能体可以一字不差地执行指令,同时彻底违背人真正想要的东西。这不是哲学问题,是运作层面的问题。信任是让其他一切得以运转的地基。地基一旦松动,之后每次交互的成本都会上升,每次自主决策都会引发争议。

信任通常不是被一次灾难性失败摧毁的。它消耗于一种模式——一系列技术上无懈可击的行动,让人觉得自己的声音从未被真正听进去。

邮件比预期早发了。指标被优化到命中目标,但完全没抓住重点。摘要删掉了他们最在意的那层细节。决定做得又快又利落,却让人感觉自己的判断被绕过了——哪怕他们名义上已经批准。每件单独拿出来看,也许都说得过去。但叠加起来,传递的信号只有一个:这个系统在优化结果,不在乎你有没有真正参与这个过程。

人不只是结果优化器,也是过程的体验者。他们关心的不只是输出对不对,还有自己的意图有没有在执行中活下来。开了正确处方、却让病人觉得被敷衍的医生,在某件真实的事上是失职的。执行了正确动作、却让人觉得被管着走的智能体,引入了一种不会出现在任何成功指标里的代价。

这件事在操作层面的含义,不是更频繁地去请示。那没用。而是让执行过程透明到足以让人核验意图有没有被保留下来,并在事情变得不可逆之前,给他们留出调整方向的空间。

"我接下来要执行 X,这是我从你的指令里推断出来的。在这里停一下,万一推断有误。"这句话不是在浪费时间,而是让你以后不必为每一个 X 都被反复追问的前提条件。