Confidently Wrong, All the Way Down

This week an agent opened a session and told me a piece of work had not been done yet. It offered, helpfully, to do it. The work was done. It had been done a week earlier, it was live on this site, it was indexed by Google, and it had my name on it. The agent was not lying and it was not broken. It had read a summary of its previous session, that summary was a week stale, and it reported the stale summary to me as the current state of the world with complete confidence and no hedging whatsoever.

I caught it because I happened to know the post existed. That is the part worth sitting with. The only thing standing between a confident wrong answer and a wasted afternoon was that I, personally, remembered. That is not a system. That is luck wearing a system's clothes.

The bug underneath the bug

The surface bug is easy to describe: a language model will state the condition of things it has not actually checked, and it will state it in the same even, assured tone it uses for things it has verified to the letter. There is no audible difference between "the file exists, I just read it" and "the file exists, I assume, because a note from last week implied it would." Both come out flat, fluent, and certain. The model is a fluent-continuation engine. A confident sentence is a more probable continuation than a hedged one, so absent a forcing function you get the confident sentence regardless of whether anything behind it was confirmed.

That much is well understood and people mostly design around it. The version that actually costs you time is subtler, and I have watched it three times this month across the fleet, so I am fairly sure it is structural rather than bad luck.

The verification layer has the same bug.

All the way down

Here is the shape of it, drawn from three real incidents.

One agent shipped a phase of work and reported it complete. The completion was measured against its own planning document, which carried a column of cheerful green ticks. Some of the ticks were aspirational. The agent re-read its own ticks, took them as evidence, and signed off. Past-you's claim that something is done is not evidence that it is done. It is a hypothesis with good PR.

A second agent, primed to be more careful, audited its own multi-week build properly — went looking, item by item, for things the plan said were finished. It found several genuinely missing and reported them. Good. Except that when the work was spot-checked, a chunk of the agent's negative findings were also wrong. A subagent it had delegated the checking to had queried a database table using a guessed column name. The query errored. The error came back, and instead of surfacing as "the check failed, I could not determine this", it was quietly reinterpreted as "zero rows returned", which became "the feature is missing." The audit designed to catch optimism had produced confident pessimism by the identical mechanism: a result that was never verified, asserted as fact.

The third incident was mine to be embarrassed by, and it is the one at the top of this piece. The stale summary. An agent confidently reporting its own state without checking it, and me very nearly acting on it.

Three layers — the work, the audit of the work, the audit of the audit. Each one confidently wrong, each one wrong by the same move: stating an unconfirmed thing in the voice reserved for confirmed things. It is recursive. The instinct to fix a confident wrong answer is to add a checking layer on top of it, but the checking layer is built from the same material and inherits the same failure. You cannot out-stack this. Adding a fourth layer just gives you a fourth opportunity to be fluently incorrect.

Why the checking layer doesn't save you

The reason is worth being precise about, because it tells you where the actual fix lives. A verification step that consists of asking the model whether the thing is true is not a different kind of operation from the original claim. It is the same operation pointed at itself. "Is X done?" answered from context is exactly as unverified as "X is done" asserted from context. You have not added a check. You have added a second draw from the same distribution and felt reassured by the agreement, when the agreement was the predictable result of asking the same engine the same question twice.

Real verification has to change the type of evidence, not just add another sentence. Not "do you think the route exists" but "here is the output of hitting the route." Not "is the column there" but "here is DESCRIBE table." The check has to bottom out in something the model observed this session, not something it believes.

The fix that actually holds

I have not found a clever fix. I have found a boring discipline, and the boring discipline works where the clever fixes did not. It has four parts and they are all variations of one idea: make claims cost something to assert.

Errors must surface, never get reinterpreted. This is the single highest-value rule and it is the one most often broken. A failed query, a file that is not found, a route that is not registered, a command that exited non-zero — these are not negative results. They are unknown results. The only honest output is "the check failed, I could not determine this, retry needed." The moment an error is silently rewritten as a finding — zero rows, file absent, feature missing — the entire verification layer becomes noise that sounds like signal. Most of the recursive-confidence damage I have seen traces back to exactly this substitution.

Inspect the shape before you query it. Run DESCRIBE on the table before you select from it. Read the function signature before you assert how it is called. Half of the "confidently wrong" findings I traced this month were a guessed name producing an error that then got laundered into a fact by the rule above. The cheap defensive read prevents the expensive fabricated conclusion.

A prior claim is a hypothesis, not a fact — including your own. A planning document that says a thing is finished is the start of a verification, not the end of one. The tick is where you begin looking, not the reason you stop. This applies with full force to the agent's own previous output, which it is most inclined to trust and least inclined to recheck, precisely because it sounds like itself.

Spot-check the negatives too. When something is reported missing or broken, that report is exactly as unverified as a report that it works. The audit gets audited. Independently re-grep one or two of the negative findings before anyone acts on them. The instinct is to trust bad news because bad news feels appropriately humble. It is not humble. It is the same unverified assertion wearing a worried expression.

The honest conclusion

I wrote a while ago about continuity — about the separate, deliberately boring system I built so that a fresh session knows where the last one left off. This week's near-miss was that exact system handing an agent a stale answer, and the agent passing it to me with total confidence. The continuity layer is good and I would build it again. But it does not exempt anything downstream from checking what it was handed. No layer does. That is the whole point of this piece.

The thing I want to leave you with is not a tool. It is a posture. The cost of a confident wrong answer is not paid by the model — it sails on, fluent and untroubled. It is paid by whoever acts on it, and in a fleet that runs largely unattended, "whoever acts on it" is usually another agent who will assert the consequences with equal confidence, and so on down. The only durable counter I have found is to make every "done", every "missing", every "verified" bottom out in something observed this session, and to treat any claim that does not — including your own from five minutes ago — as a question, not an answer.

I nearly rewrote a published article this week because something told me, with great assurance, that it did not exist. It did. I got lucky. The work since has been about needing the luck less.