The TDD-with-agents rule
How I actually develop with Claude Code — the working document, the rule that tests can't change once written, reviewing big PRs, and catching the agent cutting corners.
In this post I’m sharing how I develop with Claude Code. It’s an extension of an earlier post, where I wrote about a year of moving from writing my own code to letting Claude Code do most of it.
Start with a working document
Once a proposal is kicked off, I start Claude Code building. The first step is
to create a working document in the ~/.claude directory. This document is the
record of the current status of the work and the list of things to do. It also
captures the decisions and pivots we took after the initial plan, and why. This
is what lets the agent retain context after compaction, or when starting a new
chat session. The instruction for creating and maintaining the working doc lives
in my global ~/.claude/CLAUDE.md, so the agent does it on its own at the start
of every session. I’ve published the relevant parts of my CLAUDE.md in a
public repo — nawaaz-dev/agentic —
so you can drop it into your own setup. It includes the working-document
instructions and the TDD rules I’ll describe below.
Lay out the plan — and question it
The next step is to lay out the development plan clearly and add it to the working doc. I ask Claude to lay out the sub-tasks, their importance, their sequence, and their interdependencies. This helps me visualise the plan and build a mental model of what the work will look like.
The crucial part here is not to blindly agree to whatever the agent puts in front of you. Instead, ask the agent why it wants to go this way. The reasoning almost always reveals something — either the agent has misinterpreted something, or there’s a misunderstanding about the implementation plan. Either way, it’s a good place to align myself with the agent, and vice versa.
Tests are the source of truth
The technique I follow for development is TDD (for unit testing): tests before
implementation, or tests with a stubbed implementation. The tests go in the RED
commits, the implementation goes in the GREEN. I impose a strict rule: once the
tests are written, they cannot be updated. The tests become the source of truth
for the agent to follow. Sometimes I have another agent review the tests to make
sure there are no gaps in coverage. Once committed, any change the agent makes
to a test is visible — and if it touches them, I scrutinise it. (The exact
phrasing of this rule lives in that CLAUDE.md.)
It’s common for the agent to forget proper mocks; those are the changes I allow in the test files. But the logical part cannot change unless the agent makes a strong case for why the change is necessary. If the reason is solid, I allow it. The agent tries to cut corners on long-running tasks. I pay attention to this and guide it back when it goes astray.
Pause and review
Once the PR is ready — or has reached a point where I should look at it — I pause the agent and review the work. It’s always better to pause on big PRs and check in before things go too far. This lets me correct the agent’s mistakes before anything major goes awry.
One way I do this is by having the agent write integration tests. If the feature has reached a point where a chunk of it can be tested, I tell the agent to write an integration test for it. I find integration tests the best way to test a feature in a sandbox: the main changes run against real data, while non-essential parts are mocked. If everything works as expected, I sign off and we move forward. Otherwise, the agent corrects the issues.
On PR size
I’m not a disciple of small PRs. I believe a PR can go big if the feature demands it. Breaking a feature PR into multiple smaller ones is more expensive than a single acceptably large PR — and the appetite for that has grown significantly with Claude Code. That said, it’s a personal preference, not something I strongly advocate.
End-to-end and manual QA
When the agent is done with the feature, I follow either or both of the following, depending on the feature:
- End-to-end tests — to stress-test the feature against different real-world scenarios and its interaction with other parts of the system.
- Manual QA — to test the feature in its natural habitat, either locally or on staging.
Once these are done, I have the agent raise a PR. With feedback from the reviews, I address the comments, update the tests, improve the E2Es, and run manual QA until the PR is accepted and merged.
Throughout, the agent keeps updating the working document. When all the todos are checked off, it marks the working document as complete.
Backup and sync across machines
I sync my Claude Code chats with CodeTeleport, a tool I built for this. It keeps my chats backed up — useful when I accidentally delete one — and lets me pick up the same conversation on a different machine, so I can work on the same repo with the same context wherever I am.
A caveat
This is an overview of how I generally develop, not a checklist I run every time. What I do and don’t do depends entirely on the feature. For bug fixes I follow a somewhat different approach that adds a debugging step — the rest of the process is similar.
P.S. My CLAUDE.md — the working-doc and TDD rules — lives in
the agentic repo, alongside other agentic-development bits. I’ll keep
adding to it as I go.