Engineering · 9 min read · June 8, 2026

Responsible token-maxing.

Throwing tokens at a model gives you a demo that falls over on the second click. Here is how we let AI write production code in our own repo, with a plan, a memory, and a paper trail.

The Winsen Team

Published June 8, 2026

Token-maxing is a meme for a reason. You point a strong model at a problem, hand it a generous budget, and watch it go. For about ninety seconds it looks like magic. It scaffolds a whole app, picks libraries, writes tests it did not run, and hands you a green checkmark. Then you click the second button and the thing falls over, because the model was optimizing for a convincing first click, not for the tenth one.

We do this every day, on our own production codebase, and we got tired of demos that die on the second click. The fix is not fewer tokens. It is responsible token-maxing: tokens spent with a plan, a memory, and a paper trail, so the output is something you would ship instead of something you would screenshot. We built a tool to make that the default. It is called Rocketman.

Why the demos fall over

A coding model with no plan is a brilliant new hire who gets fired and rehired with amnesia every lunch. It cannot tell you why a decision was made last week because it was not there last week. It reaches for whatever stack it remembers, which is whatever was current at training time. That is already stale. It writes a function, forgets it exists, and writes a slightly different one two files over. None of this is a capability problem. It is a context problem, and you cannot brute-force context with more tokens any more than you can fix amnesia by talking louder.

The responsible version gives the model the three things a good human hire gets and an amnesiac one never does. A plan, so the work has a shape before a single line is written. A memory, so today's work knows what yesterday's work decided. A paper trail, so when something breaks at 2am you can read what happened instead of guessing. Everything below is how Rocketman makes those three real, in the repo, where the work happens.

The hub is a file, not an API

The center of Rocketman is one self-contained file: PM/index.html, sitting in your repo next to the code. It is an offline project-management hub with zero dependencies that runs on Node 18 and nothing else. It is a file, not a SaaS board, because the AI coding agent already lives in your repo. Asking it to read a file is free. Asking it to call an external project tool means an API token, an OAuth wall, a rate limiter, and the eventual morning where the integration is down and the work stops. A file has none of those failure modes. The model just opens it.

It is HTML on purpose. As Thariq Shihipar argued in The Unreasonable Effectiveness of HTML, HTML is the format models are weirdly good at, and a hub turns a document you would skim into one you would read closely. A kanban board, a diff, a decision tree, all visible at a glance instead of buried in a wall of markdown. The model gets a structured surface to reason over, and so do you, on the same file, with no second tool to keep in sync.

PM/index.html · Rocketman

offline · 0 deps · Node 18

Board

winsen-bridge · session #41

token budget218k / 500k

Backlog5

Rate-limit the OTP endpoint

Migrate sessions to Redis

In Progress1

Refactor the session pool

agent

Review2

PR #1284 SSO timeout fix

agent

Docs PR #1286

you

Done12

Ship v4.2 hotfix

agent

decisions · latest

ADR-12Session pool: Redis over in-memory. Survives a deploy.

ADR-11Phased cutover beat big-bang. Reversible if v4.2 regresses.

One file in the repo. Kanban, token budget, and the decision log, all in a render the model reads as easily as you do.

Agents edit data, the build renders

Letting a model hand-edit a rendered HTML file is how you get drift, broken markup, and a state nobody can trust. So Rocketman splits data from presentation, hard. The source of truth is small, diffable JSON: core, tasks, spec, content. The hub HTML is generated from that JSON and read-only. A guard hook blocks hand-edits to the rendered file, and the build is deterministic, so the same data in produces a byte-identical hub out.

This is the part that makes the rest safe. Agents never touch the rendered surface. They edit the JSON, the build renders, and because the render is deterministic, a diff on the hub is a real diff on real state, not noise from a reformat. The board cannot lie to you, because the board is a pure function of data you can read in a pull request.

More tokens do not fix amnesia any more than talking louder does. You fix it with a plan, a memory, and a paper trail.

It version-controls with the code

Because the hub and its JSON live in the repo, project state branches, diffs, reviews, and merges with the code it describes. This sounds small and it is the whole game. Open a feature branch and the project state for that feature comes with it. Review the PR and you review the plan and the code in one place. Merge and they land together. Compare that to a Linear board, which sits in a separate system that silently drifts the moment the code and the plan disagree, and nobody notices until a planning meeting three weeks later. State that merges with the code cannot drift, because drift is just two sources of truth, and there is only one.

The Track, from idea to launch

On top of the hub sits a Claude skill stack that takes an idea to production and keeps iterating. It is a sequence with gates, not a single heroic prompt:

PLAN.md

written before line one

✓Read the failing test and the v4.2 diff

✓Reproduce the SSO timeout locally

●Move the session pool behind Redis

○Add a retry on cold connections

○Backfill the regression test, then open the PR

The Track is a sequence with gates, not one heroic prompt. Every step has to clear before the next one starts.

→Ideate: force the product thinking up front, before any code, so you are building the right thing.
→Research: verify the current correct stack, live. This is the antidote to a model reaching for the stale stack it remembers from training.
→PRD and Plan: write down the spec and the task breakdown, in JSON, in the hub.
→Build: the board becomes a work queue. A parent agent picks up ready tasks and allocates them to sub-agents, a fleet working in parallel.
→Verify: a build, lint, tests, and doctor gate. Machine-checkable, no vibes.
→Test: a human test script, because some things only a person should sign off on.
→Launch, then Iterate: ship it, then keep going, with the same paper trail.

The build step is where token-maxing earns its keep. The work queue feeds a parent agent that fans ready tasks out to a fleet of sub-agents. They run in separate terminals and coordinate through an agent relay: a conflict-free file bus they use to message each other and hand off tasks, surfaced in a Fleet view so you can watch the whole crew work. This is a lot of tokens. The difference from the meme is that every one of those tokens is spent against a plan, writing to a verified gate, with state under version control. That is the responsible part. The scale is the point, not the problem.

PR #1284 · session-pool refactor

✓ 142 checks passed

src/auth/session.ts

-const pool = new Map()

+const pool = new RedisPool(env.REDIS_URL)

return pool.acquire(sessionId)

ENG

Dev: Matches your pool conventions and the tests are green. One flag: line 42 drops the retry on a cold connection. I left a comment. Not merging, that call is yours.

MergeRequest changeswaiting on a human

Every token a sub-agent spends lands against a verified gate. The fleet writes it, the gate checks it, a person signs it.

Provenance, so you know who wrote what

Every item in the hub is attributed: human or agent, and which model, opus or sonnet or haiku. Items backlink to each other with [[id]] references, so a task points at its spec and its decision and the work that closed it. When something looks wrong, you do not have to wonder whether a person or a model made the call, or which model. It is written down. The AI drafts and proposes, the consequential calls wait for a human, and the record shows exactly who did which. You stay the one with judgment. The fleet stays the one with stamina.

DECISIONS.log

14:02Phased cutover, not big-bangthread #eng-sso

13:30Redis over in-memory sessionsADR-12

11:18Hold the docs PR until the fix landsPR #1286

Who decided what, which model, and when. When something breaks at 2am you read the record instead of guessing.

It dogfoods itself

We did not build a demo of this and then build Rocketman by hand. Rocketman's own roadmap lives in Rocketman's own hub. The thing that takes an idea to production took itself to production, which is the only proof we trust. If we would not run our own repo on it, we would not have shipped it, and we certainly would not have told you to npx it.

Giving the model the things a good hire gets is what makes the difference: a plan to work against, a memory that persists, a paper trail that survives the morning after. Do that and token-maxing stops being a meme. It becomes a release process. Rocketman is open source, MIT, and one command away: npx @winsendotai/rocketman.

Hire an AI employee for one role, watch it work a visible queue, and approve every output before it counts.

Get in touch

Responsible token-maxing.

Why the demos fall over

The hub is a file, not an API

Agents edit data, the build renders

It version-controls with the code

The Track, from idea to launch

Provenance, so you know who wrote what

It dogfoods itself

The unreasonable effectiveness of HTML

How an AI employee reads a 300-page document.

One deployment per enterprise customer.

The command center for AI employees.

Work is better with Winsen.