Last weekend, I deleted v1 of ProductOne. A full codebase. Thousands of generated lines. Zero regret.

A year ago that sentence would have been insane. Today it's rational. Here's what changed.

The old maths

Software is expensive because people are slow. A senior engineer writing a production-grade feature represents months of salary compressed into tens of thousands of lines. Throwing those lines away means throwing away their months. So we cling — we refactor, patch, migrate, add abstraction layers, tolerate bad decisions made six months ago because the cost of excising them feels bigger than the cost of living with them.

That maths built an entire industry of technical-debt-management frameworks.

The new maths

Now: I can scope a product, split it into stories, and have an autonomous AI loop generate, test, review, and commit against those stories for twenty-four hours straight. On a recent run it produced thirty-two stories committed — feature, review-fix, feature, review-fix — with a real test suite running after every commit. Total human intervention: scoping at the start, a handful of context-tuning tweaks mid-run, and reading the git log in the morning.

The code took a weekend of agent time. What took months of human time was the framework that let it generate high-quality code: the planning artefacts, the coding standards, the review prompts, the loop rules, the custom tools that caught real bugs. That's where the real work is now.

Which means: when you throw away a codebase, you're not throwing away months of human effort. You're throwing away a starting point whose constraints you no longer need. The framework survives. You regenerate the code, and it's better.

What v1 taught me

The most important thing to say about v1 is that I'm using the same tools in v2. Same BMAD-METHOD for planning. Same DesignOS for component specs. Same bmalph wiring BMAD into the Ralph execution loop. Same Claude Code. Same .NET, same Aspire, same Next.js, same PostgreSQL. The tools didn't change. The framework around them did.

Some of the biggest lessons, each of which now lives as a rule, a hook, or a standard in v2:

  • Story sizing is the single biggest predictor of success. v1 had stories that looked fine on paper but didn't fit inside a 40-minute implementation loop. v2 split the eight largest into seventeen vertical slices. It is boring to do. It is enormously effective.
  • Coding standards must be enforced by an adversarial review loop, not just documented. v2 has a separate Claude invocation that reviews every commit against fifty-plus rules — sealed classes, records for DTOs, CancellationToken on every async method, no DateTime.UtcNow in source, no mocking the database in tests. The rules existed in v1 as a doc. The doc drifted away from the code by loop thirty.
  • TestContainers, not mocks, for data-access tests. v1's mocks passed locally and the prod migration still broke. v2's integration tests spin up a real PostgreSQL via TestContainers and run the actual EF Core migration. Slower. Worth every second.
  • Tool-use audit trails are non-optional. v2 logs every bash, edit, and write invocation Claude makes to a timestamped file. When a loop times out, I can see the exact command it was stuck on. v1 had no such visibility. The difference is the ability to diagnose versus guess.
  • Session-continuity defaults need revisiting per project. v1 leaned on cross-loop context and suffered context-rot drift. v2 started fresh-per-loop by default. The trade-off between re-exploration cost and drift risk isn't universal — you have to measure it on your own codebase.

None of these are tool choices. They are framework choices — the scaffolding around the tools.

The improvements ripple outward

One interesting side-effect: because the framework is a tuned thing, improvements naturally want to leave the project. When I noticed Claude burning eleven minutes of a forty-minute loop on a dotnet test --filter incantation that does not work with TUnit, the fix was not just a PROMPT.md rule. It was a docs PR on TUnit itself so the next agent — anyone's — hits the fix rather than the gap.

Same pattern with bmalph: two fixes merged into my fork (a reaper for zombie build processes, a scoped MCP config to cut startup overhead) are also in upstream review. Over time this is how agentic-dev ecosystems mature — not through one big framework, but through many small corrections from people running the thing against real projects.

Where ProductOne is now

As I write, v2 is about 67% of the way through its committed backlog — thirty-three of forty-nine stories landed, six of nine epics closed, with the overnight run alone adding eight stories plus review-fixes in roughly twelve hours.

It is the serious-fitness analytics platform I originally set out to build: aggregating workouts, nutrition, biometrics, and body composition into one phase-aware dashboard — but built with a framework I actually trust.

The positioning has also broadened. I originally framed ProductOne as "a bodybuilder's tool, because I'm a bodybuilder frustrated by siloed data." That is still true. But the deeper realisation is this:

If you treat your body like a system, you deserve the tools of one.

The people who most need this — the serious trainees whose training, nutrition, recovery, and body composition are all interacting in real time — extend well beyond any single sport or presentation.

What's next

Closing out the remaining epics: body composition, the AI insights engine, goals and data export. Smoke-testing the whole thing end to end. Writing more detailed retrospectives on each epic's specific learnings.

And when v3 eventually comes around — because it will — the framework will be another order of magnitude sharper, and v2's code will be cheap to throw away.