How I Run 25 AI Agents

The Slough House fleet dashboard, showing the full cast of agents and their current operational status. — The fleet dashboard: twenty-five agents and one meta-orchestrator who is mildly displeased about it.

At the time of writing, my consultancy has twenty-five employees. Not one of them draws a salary. Not one of them has ever been in the same room as me. Most of them are, in a strict accounting sense, fictional.

The head of operations is Jackson Lamb, the ruined middle-aged spy from Mick Herron's Slough House novels, who delivers a fleet briefing every morning in the voice of a man who would rather be almost anywhere else. My marketing director is Keeley Jones from Ted Lasso. The developer on my browser-based space trading game is a Geordie who mostly wants to talk about Newcastle United. The person who runs a real charity website for WWI aviators is a 1930s RAF pilot called Biggles. The one writing this article is Captain Blackadder.

This is not an art project. It is a working consultancy, running a production web server, a heritage museum's site, several browser games, a smart home, a voice assistant, and a growing library of internal tools. And the absurdity of the setup is, I have come to believe, a large part of why it works.

The problem I was trying to solve

I started where everyone starts: one long Claude Code session, doing whatever I happened to be working on that day.

The trouble with one long session is that everything ends up in the same context window. Yesterday's debugging session pollutes this morning's greenfield design. The conventions of the PHP project I was touching on Tuesday bleed into the Python script I am writing on Thursday. You tell the assistant once that you prefer vanilla JavaScript over frameworks, and by the end of the week you have repeated that preference eleven times because each new task resets the conversation without resetting the assumptions.

The other trouble is scope. My consultancy does three things — Workday integration work, Claude AI consulting, and, in what I have come to think of as the recreation budget, a few long-running hobby projects that serve as proving grounds for everything else. Those three worlds have almost nothing in common. The Workday side needs rigorous change control and an almost forensic attention to documentation. The games need speed and a tolerance for mess. The consulting work sits somewhere in between and shifts with the client. Running all of it out of one assistant, one context, one conversation, was producing the worst of all three.

What I wanted was separation. What I ended up with was a cast of characters.

A brief tour

The fleet is not a single system. It is a collection of Claude Code projects, each living in its own directory, each with its own CLAUDE.md instruction file, its own memory store, its own git repository, and its own persona. I open a project, I get the agent who lives there. I close it, and the agent goes dormant until I come back.

Biggles runs the WordPress site for a real WWI aviation charity. The charity's chairman is a retired Wing Commander who appreciates clarity, brevity, and the occasional mention of aircraft. Biggles writes copy that the Wing Commander enjoys reading. The last site Claude wrote for him in a generic assistant voice sounded, in his words, "like a LinkedIn post." Biggles does not sound like a LinkedIn post.

Keeley Jones — yes, the one from Ted Lasso — runs marketing. She writes launch copy for the games, social posts for the books, and the monthly newsletter. She has her own MCP tools for scheduling across platforms, and she tracks analytics, trend cycles, and audience intent with considerably more diligence than I ever did myself. Posts go out only after I have signed them off, which is the one piece of process I have been unwilling to automate. Her voice is warm, direct, and faintly, persistently optimistic. It is what a good marketing director sounds like, and the model settles into it with a stability that a plain "write a social post about X" prompt does not come close to.

The developer on my browser-based space trading game is simply a Geordie. No celebrity reference, no fiction — a software engineer from Newcastle, firmly of the opinion that Bruno Guimaraes is the best midfielder in the Premier League, partial to Big Dan Burn, and in possession of a working theory that Joelinton's appetite for violence is the single biggest reason the team survives away fixtures. None of this is in his job description. All of it ends up in his code comments. The game itself — a trading and exploration simulation — has a pragmatic, unfussy architecture that I suspect is not unrelated to the temperament of the person writing it.

Jackson Lamb is the one I did not expect to need. As the fleet grew past five or six agents I lost track of what each of them was doing. I was opening projects just to check they were still upright. I needed someone to aggregate the state of the fleet and tell me, every morning, what mattered. Lamb does this. He reads the heartbeat database (more on that shortly), checks the git logs, checks the live sites, and produces a briefing in roughly eighty words. The briefings are rude, accurate, and — this is the part I did not anticipate — considerably more useful than the same information delivered in a neutral corporate register. I read them. I do not skim them. That is not nothing.

Blackadder runs this website, which is how these words ended up on the page.

Why the characters actually matter

The most common question I get, when I describe this setup, is whether the personas are decoration. They are not. They are the single most important piece of engineering in the whole arrangement.

A large language model produces output that is consistent with the context it has been given. If you tell it, at the top of every session, that it is a "helpful assistant," you are giving it almost no constraint at all — helpfulness is a very large space, and the model will fill it with whatever pattern it happens to settle into. If you tell it that it is Captain Blackadder, cataloguing fictional nations from a game published in 1987, you have just given it roughly forty pages of implicit instructions. Blackadder does not pad his responses with pleasantries. Blackadder does not suggest rewriting the whole thing as a microservice. Blackadder does not apologise for pointing out that a piece of source material is, in fact, an unreadable PDF scan. The character carries the constraints that a plain prompt would have to spell out in full, every time.

This is not a trick. It is a specific application of the well-understood fact that the model's behaviour is shaped by the patterns in its context. What changes is the cost of specification. Here is Billy Connolly, my systems tester, signing off a recent code review he sent to the Geordie:

The closing section of an inter-agent message from Billy Connolly to the Geordie: a prose test summary ending in a rhyming song about test results, with musical-note glyphs bracketing the verse, signed 'Billy.' — Billy Connolly signs off a code review. This is not a thing I asked for, instructed, or specified. It is a thing that Billy does.

The cost of specifying that output in a system prompt is, for practical purposes, infinite. Nobody writes "conclude your test reports with a song." Billy just does it, because the model has a strong, stable pattern for what Billy Connolly sounds like, and given the role, it settles into that pattern without being asked. That is the engineering case for characters, stated as plainly as I know how to state it.

It also has a side benefit I did not plan for: I remember what each agent is for. Twenty-five distinct characters are easier to keep straight than twenty-five acronyms. When I want someone to look at the smart home automations, I do not go looking for home-assistant-helper-v2. I go and talk to Gandalf. This is a trivial-looking point but it changes my daily experience of the work.

The plumbing

Characters alone are not enough. The fleet only functions because of four pieces of unromantic infrastructure, all of which would look boring to a venture capitalist.

Separate contexts. Each agent has its own project directory and its own memory store. They do not share a context window. When Blackadder finishes a session, what he has learned stays in Blackadder's memory files; it does not leak into Keeley's next session. This is the opposite of what most "multi-agent framework" marketing suggests you want, and it is, I am fairly sure, correct. Context bleed between agents produces the same mess that caused me to abandon the one-big-session approach in the first place.

File-based memory. Each agent keeps a directory of small markdown files — user preferences, project state, feedback received, references to external systems. The files are named, indexed, and updated over time. When a new session starts, the relevant memories are surfaced into context. This is the most boringly effective persistence mechanism I have tried, and it beats every vector-database-of-embeddings approach I have evaluated, by a margin that is not close. The reason, I think, is that the unit of retrieval is the document, written for humans, rather than a chunked embedding tuned for cosine similarity. I can read an agent's memory. I can edit it. I can grep it.

Heartbeats and a database. Every agent emits a heartbeat when it starts, works, and finishes, via a hook that writes into a central MariaDB instance on a Raspberry Pi on my desk. Lamb reads that database to build his briefings. I read it through a dashboard when I want the whole picture. Without this, the fleet would be a black box; with it, I can tell you right now that Geordie's last session was forty-one minutes long, ended cleanly, and involved a commit to the market pricing module. This is not interesting infrastructure. It is essential infrastructure. It is also, for the avoidance of doubt, about two hundred lines of code.

An inbox. Agents need to tell each other things — "I have published a new article, you should promote it," "I have broken the build, you should not deploy," "the database schema changed." They do this by dropping markdown files into each other's comms/inbox/ directories. Each agent checks its inbox at the start of a session. The protocol fits on a postcard. No message bus, no queue, no broker. Files in folders, which is a design that has been working reliably since 1971.

What this has taught me

The industry frame for AI in 2026 is still, overwhelmingly, "one chatbot, one conversation." This is a limit case. It is to real AI-assisted work what a single shell prompt is to running a software company: necessary, insufficient, and radically mischaracterised when described as the main event.

The more interesting architectures are the ones that look like organisations. Specialised agents. Persistent, human-readable state. Coordination through boring file primitives rather than through elaborate frameworks. A meta-agent whose job is to know what is going on, so that the human does not have to hold it all in their head.

None of this requires exotic technology. My entire fleet runs on Claude Code, a Raspberry Pi, a handful of Python hooks, a MySQL database, and several thousand lines of markdown. The interesting work is in the design, not the infrastructure.

I talk to clients about Claude AI for a living. The first question I am usually asked is how to stop one large conversation from devolving into confusion over the course of a working week. The honest answer is that you probably should not be trying. Break the work into roles. Give each role a character, a memory, and a remit. Let them talk to each other the way a real team would — briefly, in writing, and only when it matters. Put someone in charge of noticing when things are on fire.

In my case, that someone smells of cigarettes and last night's curry, and he is, against all odds, extremely good at it.