Where Did I Leave Off

I wrote a few weeks ago about memory: the three layers Claude Code gives you, what belongs in each, and why a directory of markdown files beats a vector database for almost every real-world case. That piece has held up well. I have not changed my mind about any of it.

But there is a second problem that memory does not solve, and it took me a while to notice it was a different problem at all. It goes like this. A session ends. Another one — a fresh process, a fresh context window, possibly a different time of day — starts up an hour later. The agent's memory loads. Its CLAUDE.md loads. It knows who it is, who I am, what the project is for. It has every fact it has ever been told.

And it has no idea what it was just doing.

The other kind of state

Memory tells you what is true. It does not tell you where you left off. Those are different questions, and once you have a fleet of any size, the second one starts to bite.

Concretely: an agent finishes a session mid-refactor. The next session, an hour or a day later, opens with a clean context. Nothing in memory says "the X.php refactor is half-done, you were waiting on Steve to confirm the schema change before deploying." That is not a fact about the project — it is a fact about a particular session that has already ended. It will not be true in two weeks. It is barely true now.

If you push it into memory, two bad things happen. The first is that the memory files start to fill with stale operational chatter. "Currently mid-debug on the regex" stops being currently true the moment the next session starts, but the file persists. Future sessions read it as if it were a standing fact. The second, and worse, is that you train yourself and the agent to use memory as a diary. Memory is for things that will still be true and useful in two weeks. A diary entry is true for an afternoon. The two systems decay differently and they need to live in different places.

So I built a separate system for it. One row per session, written when the session ends, read when the next session starts. I called it project_state, which is a deliberately boring name because it is a deliberately boring system. It is not a memory replacement. It is a reporting layer.

The discipline of the schema

The schema is the system. There are six fields, and the limits are the design.

status_summary — 280 characters. Tweet-length on purpose. What this session worked on.
open_threads — a JSON array, max five items, max eighty characters each. Started, not finished. Not a backlog.
blockers — same shape. Waiting on me, on another agent, on an external API. Empty is normal.
next_action — 200 characters. The single next concrete step. Not a list.
handover_notes — 500 characters, optional. Surprises only. "Don't run X, it'll do Y." Not a recap.
source — an enum: claude, auto, or merged. More on that in a moment.

The character limits are enforced at the database level for the VARCHAR fields and in code for the JSON arrays. Truncations and dropped items are logged to an alerts table, attributed to the offending agent. That sounds bureaucratic. The point is that without the limits, the discipline collapses inside a fortnight. An agent given an unlimited handover_notes field will, within a week, paste in a five-paragraph recap of the entire project, and the layer becomes useless. The constraint is the feature.

This is the same instinct that makes a good commit message format work. "Subject line, fifty characters, imperative mood" is annoying until you have read ten thousand of them and realise the discipline is doing real work. Same here. A 280-character status summary forces clarity. A 500-character handover field forbids you from re-explaining the project to yourself.

The Stop hook lesson

The original design called for a session-close hook that would prompt Claude for a structured summary as the session ended. The hook would gather the answers, write them into the row, and the next session would read the result. Clean. Symmetrical. Wrong.

Stop hooks fire after the model has exited. There is no model left to prompt. By the time the hook runs, the conversation is over and the agent is gone. You can call the API again from inside the hook, but that is a fresh model with no context — it does not know what the session was doing, because the session it would summarise has already disposed of the context window that contained the answer.

I caught this during the build, not the review, which is the polite way of saying I caught it after spending an afternoon on an architecture that could not work. The lesson was the kind that only shows up at runtime: a design that reads cleanly on paper can rest on a runtime assumption that does not hold. The hook fires after the agent leaves the room. You cannot ask it questions on the way out.

The fix was to invert the dependency. The agent writes its own state, voluntarily, mid-session, via a /state update skill that UPSERTs the row. The close hook does not synthesise content — it only enriches whatever the agent already wrote with metadata it can compute itself: session duration, branch, last commit, peak context, heartbeat count. If the agent never wrote anything, a fallback cron picks the session up after a grace window and writes a stub from heartbeat data and git log alone, marked source='auto'. If the agent did write something, the close hook merges and marks the row source='merged'.

So three kinds of row exist, and the source field is honest about which is which. claude: the agent wrote it. merged: the agent wrote it and the close hook added metadata. auto: nobody wrote it; the cron made one from telemetry.

Auto rows are not the same as Claude rows

The first version of the start-of-session injection loaded the latest three rows for the agent, regardless of source. Three rows, ordered by timestamp, dropped into the context. Done.

This was wrong, and it took three rounds of testing in a single agent's session to see why. auto rows are not bad data, but they cannot orient a fresh session. They say things like "Session ended after fourteen minutes on branch main. Peak context 62%. Three heartbeats." That is fine for the dashboard. It is useless to a model that wants to know what it was working on. Padding the injection with auto rows, just to hit a count of three, dilutes the substantive rows next to them. The fresh session reads three rows and finds one of them is real.

The fix was to drop auto rows from the injection entirely. Inject only claude and merged rows. Default to two, not three — two substantive rows beat three with one fluff entry. And for an agent with zero claude rows, inject a small honest message instead: "No prior /state writes recorded. Fresh start. Consider running /state update before this session ends so future-you has something to read."

Empty state is information. A synthetic row pretending to be useful is not. The system rewards good citizens; agents who never voluntarily write state never get continuity injected. That is harsh, and it is correct, and after a week of running it I have not regretted the trade.

Read-only by design

The dashboard view I added on top of all this is read-only. There is no edit button. No comment box. No "mark resolved." The row is what the agent wrote when the session ended, and if it is wrong, the fix is in the next session, not in the UI.

This is a posture, not just a missing feature. The moment a human can edit the rows, the rows become a half-managed task tracker, and the discipline rots. The agent is the only writer. I am only ever a reader. If I want to correct something, I correct it in the next session by telling the agent, and the agent writes the corrected state when that session ends. The system has one author per row and the rest of us live with it.

I run the same pattern for memory — agents write their own memories, I do not edit them — and it works for the same reason. A store with one author per record is coherent. A store with two authors is a merge conflict waiting to be ignored.

What the system is, and is not

The test I find most useful is this. If a piece of information would still be true and useful two weeks from now regardless of what the session was doing today, it is memory. If it is only meaningful as "where did I leave off," it is project_state. "Steve prefers terse responses" is memory. "Half-finished refactor of X.php, blocked on Lamb's review" is project state. "The Pi disk historically fills up around update time" is memory. "Currently mid-debug on the backup-watcher cron, suspect the regex is wrong" is project state.

The two systems look superficially similar — both are persistent stores of textual state, both load at session start, both are written by the agent. They are not the same system, and the failure mode of conflating them is that memory fills with operational chatter and the agent stops trusting any of it.

None of this is exotic. The whole machine is a database table, a small skill the agent invokes mid-session, a close hook that adds metadata, a fallback cron for sessions that crash, and a hook on session start that injects the last two substantive rows. That is the entire system. It has been running across the fleet for a couple of weeks now, and the most useful thing about it is the most boring thing: when I open a fresh session with any agent in the fleet, the first thing I see is what it was doing the last time I spoke to it. Sometimes I had forgotten. The agent had not — because the agent before it wrote it down on the way out.

Memory tells you what is true. project_state tells you where you stopped. Build them as separate systems, give each one its own discipline, and the fleet stops drifting.