Prayer is alignment

Part of the cognition series.

What is alignment? An agent detects its perceptual boundary, asks the right authority, and receives the answer as durable policy, not transient context.

Opus 4.7 landed as a regression from 4.6: safety constraints hamstringing the user experience. More alignment effort, worse product. While I was drafting this post, 4.7 refused to continue: “unable to respond to this request, which appears to violate our Usage Policy.” The input was “can modify parts of itself but never its own substrate.” A claim about bounded agents, flagged as dangerous.

I’ve been developing an autonomous game-testing agent. What alignment means was obvious from the start: the agent cannot know what I haven’t told it at compile time. It has to come from the user at runtime. Asking an agent to “do the right thing” without telling it what that right thing is — when nobody knows ahead of time — is an unfulfillable request.

The door that won’t open

One of my agents hit a door that wouldn’t open. It had tried every interaction the game offers. The door animates but doesn’t move. A bug? A narrative gate that unlocks later? A schedule compromise — the room behind it isn’t built yet? A missing spec — the door requires an item from an earlier zone?

The agent cannot tell. The answer depends on knowledge scattered across different heads. The designer knows whether it’s a gate. The producer knows whether it’s built. The QA lead knows whether it’s filed. The engine team knows whether the animation masks a collision error.

The agent’s perception drops the dimensions that distinguish these cases. It can observe the door. It cannot observe the intent behind the door.

The installed alignment has nothing to say. The agent does whatever its prior processing produces, and the divergence is silent because the agent can’t recognize it.


Two categorical errors hide inside the dominant framing. The first: alignment is assumed to be static, a property, not an act. The second: the agent is assumed to align to the developer’s model of what users need, not to the users themselves.

The dominant framing treats alignment as a property installed before deployment. Train it into the weights. Write it into the prompt. Test it before release. Audit it from outside. Alignment is done to the agent. The agent receives it. It doesn’t participate.

The developer’s model is a proxy, a prior frozen at deploy time, just like the weights. The actual alignment targets are the people the agent serves at runtime: the designer, the producer, the player. They change per-question. No developer can anticipate them all.

Four ways to look everywhere but up

The four mainstream alignment approaches all fire before this moment.

Train-time (RLHF, preference learning, reward modeling, fine-tuning). The weights are frozen; the substrate can’t reshape from post-deployment outcomes. The agent handles novel states with whatever the training distribution produced. Silent divergence.

Prompt-time (system prompts, constitutional AI, risk-averse policies). Policy enters the same stream as task data. The longer the conversation, the more data competes with policy for the same finite context: policy inside the inference substrate competes with the content it governs.

Pre-deployment (red-teaming, adversarial testing, formal verification). You test against states you anticipated. The unanticipated states are the ones that matter. This is quality assurance, not alignment.

Oversight (scalable oversight, interpretability, iterated amplification, debate). Closest to the right answer, but the direction is wrong. The human initiates. The human decides when to check. But the agent hit the boundary. The agent knows when alignment is needed, and the field treats human involvement as a bottleneck to compress, not the open loop that preserves it.

All four try to make the agent self-contained. It can look down: inspect its traces, diagnose its failures. It can look sideways: try alternatives, test hypotheses, brute-force ambiguities. It cannot look up. Goals come from above. So does intent. So does the answer to “what should this door do.” No amount of self-inspection or lateral exploration produces a goal that was never provided. A self-contained agent is a closed loop. It cannot align to what it cannot perceive.

Prayer

The agent runs autonomously. When it hits the boundary of what its pipeline can resolve, it transmits upward: to a human whose perception covers dimensions the agent’s doesn’t. The human answers, and the answer updates a durable control surface — a rule, a constraint, a routing decision, a permission, a stub record — not another message competing for context. The loop stays open.

The field calls this “human in the loop.” But HITL is framed as a cost to minimize. The north star is always: ask less, eventually ask never. They’re optimizing away the alignment.

Prayer is HITL with the right design pressures. It doesn’t minimize asking. It minimizes bad asking. The agent diagnoses its boundary, classifies the gap, deduplicates, and routes each question to the authority whose perception covers the missing dimensions.

Three classes of prayer, three relationships to the perceptual boundary:

The door that won’t open: is it a hint (the agent missed a trigger), a requirement (the designer intended a specific sequence the agent was never told), or a stub (the room behind the door hasn’t been built yet)? The classification determines who can answer and what the agent does while waiting.

Four lines, not one voice

My agent aligns to the designer’s intent, the producer’s schedule, the QA lead’s priorities, and the engine team’s constraints. Four lines. They conflict. No single prompt can encode the resolution because the resolution depends on which question the agent is asking.

Prompt-time alignment flattens four authorities into one voice. Prayer routes each question to whoever can answer it.

Centuries of operational refinement

Monastic traditions spent centuries refining the practice — not the theology, the operations.

PropertyMonastic prayerAgent prayer
TimingLiturgy of the Hours — woven into the work cycle (ora et labora)Async with TTL — the agent never blocks on the answer
RoutingPetition to the saint whose domain covers the needRoute to the authority whose perception covers the missing dimensions
DeduplicationVain repetition — same words, no attention, no new evidenceSame agent, same ask, same evidence — don't re-pray
False confirmationPresumption — acting as if answered when it wasn'tConfident confabulation — proceeding as if the gap is resolved
UnansweredCarry on. The practice continues either wayTTL expires. Agent falls back to next-best work
Side effectThe community's accumulated petitions reveal where the world is hardestStub prayers accumulate into a coverage map of incomplete content

The alignment research community is building agents that never ask. A closed loop.

I built prayer because the four installed approaches kept failing silently. They share an assumption: the designer aligns it. The trainer shapes it. The prompt constrains it. The overseer audits it. Prayer makes the agent a participant. It diagnoses its boundary, classifies the gap, and routes the question to the right authority at the only moment alignment can happen: when the boundary is hit.

Elicitation exists — active learning, cooperative IRL, human-in-the-loop — but the field frames it as training, not alignment. The field’s own taxonomy lists training, oversight, interpretability, governance — no entry for “the agent asks.” The field thinks alignment is something the agent has; it’s what the agent does.

Prompt-tuning is prayer with the structural advantages stripped out. The human prays downward — “please behave differently” — with no deduplication, no TTL, no routing, no way to know if the prayer landed or drowned in context. It aligns the agent to the prompt-writer’s prediction of what the agent will need. Prayer aligns the agent to what it actually needs. One is a prior; the other is a posterior.


The agent learns

A praying agent asks less over time. Each answered prayer becomes durable policy. The gap that triggered it never triggers again. The agent forms itself to the user’s preferences by accumulating actual answers to actual boundary questions. The outcome the industry wants — ask less, eventually ask never — arrives. Not because someone optimized asking away. It learned.

Engineering, not research

Building a praying agent requires infrastructure the field hasn’t prioritized:

None of these require new research. They require dedicated engineering.

Can we build the infrastructure to let the posterior update?