This is the fourth in what has become an unplanned series.
The Last Programming Language argued that the entire history of programming languages was a project to close the gap between human intent and machine execution, and that LLMs are the endpoint because they execute natural language directly. Stop Doing AGENTS.md and Itâs Model Context Protocol, Not Agent Context Protocol argued, from different angles, that most agent customization is wasted context - duplicated, task-specific, or self-evident to the model.
I think the same logic applies one level up: to the workflows and orchestration we wrap around the agents themselves.
The distance between âhow humans already know work should be doneâ and âhow AI agents do workâ is collapsing, for the same reason the programming language gap collapsed - the knowledge was always trivially expressible. We just hadnât told the machine yet.
The Cathedral#
I stopped writing code about a year ago. In its place, I developed a set of habits and techniques culminating in an elaborate orchestration system - Niko - that turned AI coding agents into something resembling a senior colleague with a rigorous process. Phase-gated workflows with complexity tiers. Memory banks split into persistent and ephemeral files. Preflight validation. QA loops. TDD enforcement. An archive system for long-term institutional memory. Mermaid diagrams as a conceit to my human desire to understand the process. A whole cathedral.
And it worked. Measurably, demonstrably better than vanilla agentic coding harnesses. My original adoption of Nikoâs core turned GPT-4o into Sonnet 3.5, back in March 2025. Nikoâs lineage traces back through vanzan01âs Cursor memory bank adaptation to the original Cline Memory Bank, community-created on the Cline Discord around early 2025. The genealogy matters because it shows that a lot of people, independently, arrived at the same conclusion: agents need structure, memory, and process to do good work.
We were all correct. And now we can all stop.
The Subsumption#
The tools have started eating the techniques we dress them up in. From Cline to Claude Code and Cursor, major harnesses and models are absorbing these behaviors natively. The gap between âcommunity workaroundâ and ânative featureâ is compressing over time
Here are some of the key techniques weâve discovered:
đșïž Planning#
The agent should research and plan before building. The single highest-leverage intervention - the difference between an agent that wanders and one that delivers.
Nikoâs phased plan-then-execute workflow forced the agent through research, planning, and validation before a single line of code got written. This came from my original learning from agentic dev experience & readings that âyou should make a plan before you start coding,â which I then taught to my agents.
I used to coax the workflow by hand, before I found Nikoâs ancestors in the Cursor Memory Bank and welcomed a formalization of the process.
đ Context Initialization#
The agent should understand the project itâs working in.
Nikoâs memory bank initialization creates a structured set of persistent files - product context, system patterns, tech context - a variation of the Cline Memory Bank set, refined through personal industry experience, giving the agent a consistent and reliable understanding of the project across sessions. The Cline Memory Bank itself was community-created on the Cline Discord around early 2025.
Claude Codeâs /init scaffolds a CLAUDE.md file by scanning the codebase, too, but only writes to the single file.
Cursorâs approach is actually more interesting and arguably better than the manual pattern: it computes embeddings for every file in your codebase and provides those alongside a brief tree/text summary of the project structure. This is indirect context - you embed the code so the model kinda-sorta knows it, without burning context window on a monolithic description. Itâs the Stop Doing AGENTS.md thesis made manifest by tooling: instead of a giant global prompt telling the model about your code, the model just implicitly knows because the code is embedded. The brief structural summary gives just enough for the model to know where it might want to look - rather than the monolithic AGENTS.md antipattern.
Multi-file repo context documents are being actively developed; I see murmurs of it in my âStop Doing AGENTS.mdâ spaces. /init will probably produce something closer to Nikoâs multi-file structure before long. And how long before the tools add âafter I finish executing a plan, go update the docs that /init touches?â
đ The Ralph Wiggum Technique#
In July 2025, Geoffrey Huntley documented the technique: put an AI coding agent in a bash while loop:
while :; do cat PROMPT.md | claude-code ; done
⊠and let it spin until it stabilizes. Each time through the loop, the context is fresh but the codebase has accreted changes, and the agent keeps making progress toward the goal in the prompt. Doesnât matter if it doesnât get it on the first try; after 50 tries overnight, it will. The name comes from Ralphâs energy - not his incompetence, but his unwavering, guileless commitment to the task at hand.
People put agents in Ralph loops and they shipped entire projects overnight.
Turns out you donât actually have to type the prompt out each time; you can just feed the same prompt in over and over again!
đ Ephemeral Context and Compaction#
Models forget. The context window fills, and older content falls away. Work is often not complete before the model canât ingest any new information. You must find a way to persist what matters.
Cursor was one of the first major harnesses to tackle this by automatically compacting conversations that approached the limit: run the history through a summarizer, produce a summary document, start a new context window with the summary injected. More recently, Cursor noted the addition of a technique that people - including myself - had been doing manually: saving the original conversation transcript to disk before compacting, then referencing the full transcript in the next conversation alongside the summary.
Niko operationalizes and obviates this by making ârecording the important things to diskâ part of the workflow from the start. Nikoâs ephemeral memory-bank files track the current task: a project brief, active context, progress, task lists, reflections, creative decisions. When Niko finishes a phase and itâs time for the human to make a decision, you open a new context window and run the next /niko-* command. Niko reads the memory bank from disk and picks up where it left off - clean context, full awareness. If you abort mid-phase, Nikoâs record-keeping enables the agent to diff the code on-disk against the last memory bank entry to deduce what was lost and resume from the right place.
What this buys you is context windows as cattle, not pets, and your âAgentâ is the state saved to disk and source control - something durable and portable. This technique largely sidesteps the problem of running out the context window and the associated risks of having it very full.
â
Validation Loops#
Verify the work. The plan must be good before building starts; what is built must be correct before the task can be considered complete.
Nikoâs preflight and QA phases are genuine validation gates. TDD forcing - making the agent write tests first and use them as back-pressure - is a key value-add that Niko delivers.
Crucially, Nikoâs validation doesnât default to coming up for air if it fails - it loops back and repeats the attempt. Agent doesnât get code perfect on the first try? You know what, a lot of humans donât, either. Going back and reworking it is the name of the game!
đ§ Archival Memory#
Remember what you learned, not just what youâre doing. Long-term institutional memory that survives beyond the current task lets you improve over time.
Niko archives summaries of past work into the memory bank - a layer of long-term institutional memory that doesnât come off-the-shelf in any of the major harnesses. But this is partly because archival is opinionated: maybe youâd âarchiveâ in Jira tickets, or GitHub issues, or commits, or a changelog, or a wiki. Eventually there will be something native. Cursor started out using a single file - .cursorrules - as memory, but has since abandoned directly advocating any particular âmemoryâ management pattern. Claude Code semi-advocates using various CLAUDE.md files as memory; Gemini CLI explicitly calls GEMINI.md a âmemory.â Most web-based chat interfaces already read past conversations for context. CodeRabbit has server-side Learnings that persist facts across code reviews.
Everyone agrees: this works and we need it.
đ Parallelization#
Break work into parallel tracks to progress as quickly as possible.
Parallelization was the one technique that couldnât be solved by better prompting. Every other innovation on this list was, at its core, a matter of telling the LLM inside the agent what to do differently. Parallelization required the harness to do something differently - you needed multiple context windows running simultaneously, with coordination between them.
Before harness support existed, I was literally cloning repositories into separate directories on my machine, launching a Cursor instance out of each location, and manually giving each one a different prompt. I was the load balancer. Claude Codeâs early subagent support was more CLI-native but similarly manual - you could spawn subagents, but you were still the one deciding what ran where and reconciling the results. Youâd handcraft hordes of subagents, or at least explicitly kick them off, to get decent parallelization on tasks - wiring up the topology yourself like a computational middle manager.
Parallelization matters disproportionately to the other techniques: itâs pure force multiplication, and it was the one place where no amount of clever process design could substitute for infrastructure the tool didnât yet provide.
The Timeline#
So thatâs a lot to manage, right? No wonder this AI stuff is hard and people struggle to get good results. Right?
In January 2025, yeah. But your information is outdated; take a look at this timeline:
February 2025 â The đ Cline Memory Bank emerges from the Cline Discord: structured markdown files giving agents persistent project đ§ memory across sessions. Claude Code launches with /init for đ context initialization.
March 2025 â Claude Code v0.2.47 ships đ auto-compaction, automatically summarizing conversations when the context window fills. Before this, you managed the window yourself.
May 2025 â Cursor 0.50 ships Background Agents in preview: đ parallelization without hand-wiring the topology.
June 2025 â Cursor 1.0 ships đ§ Memories â persistent facts across sessions. (These later evolved into Cursor Rules, themselves another subsumption: a community concept absorbed, renamed, and integrated.)
July 2025 â Claude Code v1.0.60 ships custom subagents for đ parallelization. Geoffrey Huntley documents the đ Ralph Wiggum technique: agents in bash while loops, shipping overnight.
August 2025 â Claude Code v1.0.77 ships đșïž Opus Plan Mode: use Opus for planning, a lighter model for execution.
October 2025 â Cursor 2.0 ships đșïž Plan Mode. The âmake a plan before codingâ convention becomes a toggle.
November 2025 â Cursor 2.1 improves đșïž Plan Mode: the agent can now ask clarifying questions in the UI.
December 2025 â Cursor 2.2 adds Mermaid diagrams to plans â echoing vanzan01âs use of Mermaid for visual planning in the community memory banks â dispatches plan items to parallel agents, and ships multi-agent judging for đ parallelization. Claude Code v2.0.60 ships background agents.
January 2026 â Claude Code v2.1.0 ships /plan as a first-class slash command. Both tools now detect when youâre trying to plan and enter đșïž plan mode unprompted.
February 2026 â Claude Code v2.1.59 ships đ§ auto-memories and v2.1.32 ships đ Agent Teams.
March 2026 â Claude Code v2.1.63 ships đ /loop. The Ralph Wiggum technique is now a built-in command.
Not yet absorbed
- â
Validation loops. Preflight gates. TDD forcing. QA checkpoints.
- đ§ Archival memory. Long-term institutional memory that survives beyond the current task lets you improve over time, stored somewhere durable and accessible.
Thirteen months. From the first community workaround to nearly-complete native absorption of every technique that mattered.
The Scorecard#
Niko does almost everything listed above, usually at least slightly better than the native version. But for anyone starting today, the built-in tools are past good enough. The delta is real but the delta is shrinking and the floor keeps rising. I would not tell a newcomer to learn Niko. Iâd tell them to learn Cursor or Claude Code: cleanly, clearly, and fully type their task into the box and let the agent work.
And that baseline would be good enough.
The Napkin#
The reason all of these behaviors get absorbed so easily is that the underlying knowledge fits on a napkin.
âPlan before executing.â âTest before shipping.â âRemember what you learned.â âBreak big work into small work.â âValidate before declaring done.â âArchive what you did for next time.â
These arenât arcane insights; theyâre things every human business and workflow has understood for decades to centuries, if not millennia. The entire Niko ruleset - the mermaid diagrams, the phase gates, the memory bank, the complexity tiers - is an elaborate encoding of wisdom that, stripped of implementation details, is just a handful of simple instructions.
The only reason this wisdom wasnât already in the tools is no one had gathered them all together and put them in yet. Now they largely have. A single engineer can encode âplan before executingâ into a system prompt or fine-tuning signal and it just works. Three words. When you omit them, you get what you asked for, which is execution without planning. Remembering to say them was the middle step - the era where a cottage industry of AI optimization tips emerged to teach people what amounts to basic project management. Embedding three words into a modelâs harness or system prompt is close to trivial for where the tool makers are now.
And with each generation of model, the napkin gets shorter.
The subsumption timeline tracks the harness absorbing community techniques - tool makers encoding process wisdom into native features. But thereâs a deeper layer: the models themselves absorbing behaviors that neither the harness nor the prompts need to teach anymore. Twice, Iâve solved a âmodel wonât follow instructionsâ problem not by writing better prompts or more elaborate process scaffolding, but by bumping the model. Once from Sonnet 3.7 to Sonnet 4.0, once from Sonnet 4.5 to Opus 4.6. In both cases, behaviors Iâd spent real effort trying to wring out through prompts - staying on task, following the plan without wandering, respecting phase gates - just happened on the new model without being asked. The elaborate scaffolding wasnât compensating for a process gap. It was compensating for a capability gap, and the capability gap closed.
Nikoâs /niko-* command structure - and the directive to the operator to only use these commands to navigate the workflow, ensuring Niko is in play at each step - exists because earlier models would wander off if you didnât force them through checkpoints. Nikoâs commands serve the double-duty of reinforcing the process and reinforcing the context persistence out to the memory-bank on disk. Thatâs a bandaid over attention span and instruction adherence. Better models donât need the bandaid. Some of those napkin words stop being necessary at all - not because someone built them into the harness, but because the model internalized them during training.
Some wisdom is visibly absorbed into the harness and some wisdom is invisibly absorbed into the modelâs base capabilities. And with those wisdoms, the models are building their own harnesses and training models on their own. Force-multiplication. Positive feedback loops. Each generation of model is better at specifying the behavior that makes the next generation more effective. The napkin gets shorter. Some of the words disappear entirely.
The piĂšce de rĂ©sistance of subsumption is Boris Cherny, creator of Claude Code and his âvanilla Claude Codeâ setup. Despite the absence of significant third-party addons, Boris is unarguably a power user and the âvanillaâ setup is more-complex than most other Claude Code users out there! If you read through it, youâll see a lot of echoes of all the techniques described above, just, solved with âvanillaâ Claude Code!
The practical upshot, stated directly: unless youâre on the bleeding edge and could write an essay on why a given behavior exists and how to do it better, off-the-shelf is beyond good enough and trying to optimize it yourself is time you could spend building the thing instead.
We am Become Wiggum#
We chuckled when we named the Ralph Wiggum technique. We put agents in bash loops and they just kept doing their best, bless their hearts, until they shipped. We tuned them like guitars, erected signs at the top of slides, and watched them cheerfully, relentlessly build. We felt clever. We were clever.
Now look at us.
Poking and prodding at the agentic process. Fiddling with orchestration. Tuning prompts. Adding yet another rule to AGENTS.md. Ooh, maybe a Skill this time! Reading blog posts about the optimal number of subagents (how big should your Gas Town be?). Installing one more MCP server, just in case.
âIâm helping!â
Bless our hearts. For all the cheek, Ralph actually shipped. The human in the workflowâs loop is increasingly just adding latency, not value.
The good news is that unlike Ralph, we can recognize the loop and step out of it. The practical takeaway is simple: off-the-shelf is more than good enough. Specify well - good requirements, good acceptance criteria - and let the subagents work on your plan. Pave your desire paths, hand over the keys, and get out of the way.
The Keys#
Whatâs actually left for us to do, then? Letâs talk about those âkeysâ weâre handing over.
Authentication and authorization.
The one place where the humanâs role isnât âknow something the agent doesnâtâ - thatâs a knowledge problem, and knowledge problems dissolve once you can express the answer in natural language to a sufficiently capable model.
Auth is a trust problem. You gh auth login so the agent can push. You aws sso login so it can deploy. You grant the filesystem access, the API keys, the OAuth flows. You pave the desire paths the agents will follow. MCP handles auth separation well when it applies, but the core act is still yours: being the human who says âyes, you may.â
This too is eroding, which should be no surprise because it was never a hard boundary in the first place. Organizations already delegate trust to automated systems: CI/CD pipelines hold credentials, service accounts have scoped permissions. Kubernetes operators rotate secrets without asking anyone. The trend line points toward the auth boundary around agents dissolving from the edges as organizations get comfortable granting progressively broader trust to automated actors.
In 1965, Gordon Dickson wrote a short story called Computers Donât Argue in which a man receives a book club shipment he didnât order. He tries to return it. Automated correspondence systems escalate the dispute through increasingly severe bureaucratic channels - billing, collections, legal, criminal - along the way accumulating transcription errors such that Mr. Walter A. Childâs return of the book âKidnappedâ by Robert Louis Stevenson becomes a record that Walter "kidnapped" A. Child (Robert Louis Stevenson [deceased]).
At no point does a human ever intervene to apply judgment. Every system in the chain has the authority to escalate but not the judgment to stop. The trust chain between systems is treated as sufficient with no need for a human checkpoint. The man is convicted and sentenced to death over a book order. DRY violation as Kafkaesque horror: his innocence was the canonical truth, but no system was configured to reference it.
What was missing from Dicksonâs chain wasnât technology. It was a manager - someone with the authority to review what the systems had collectively concluded and say âthis is obviously a book return, not a kidnapping.â Humans have been running organizations this way for millennia:
âDelegate authority with oversight checkpoints.â
The solution was always available; nobody applied it. Five more words on the napkin.
Prompt engineering is dust. Context management skills are ashes. What remains, after everything else has been automated away, is the architecturally unglamorous, existentially critical work of ensuring that somewhere in every automated chain, a human can check whether the system is still working toward the outcome that was actually intended - and redirect it if itâs not.
For now, that human is you. Youâre a manager now: the machines donât need your help figuring out how to do their jobs anymore.