Skip to content
← Back to blog

The TDD-with-agents rule

How I actually develop with Claude Code — the working document, the rule that tests can't change once written, reviewing big PRs, and catching the agent cutting corners.

In this post I’m sharing how I develop with Claude Code. It’s an extension of an earlier post, where I wrote about a year of moving from writing my own code to letting Claude Code do most of it.

Start with a working document

Once a proposal is kicked off, I start Claude Code building. The first step is to create a working document in the ~/.claude directory. This document is the record of the current status of the work and the list of things to do. It also captures the decisions and pivots we took after the initial plan, and why. This is what lets the agent retain context after compaction, or when starting a new chat session. The instruction for creating and maintaining the working doc lives in my global ~/.claude/CLAUDE.md, so the agent does it on its own at the start of every session. I’ve published the relevant parts of my CLAUDE.md in a public repo — nawaaz-dev/agentic — so you can drop it into your own setup. It includes the working-document instructions and the TDD rules I’ll describe below.

Lay out the plan — and question it

The next step is to lay out the development plan clearly and add it to the working doc. I ask Claude to lay out the sub-tasks, their importance, their sequence, and their interdependencies. This helps me visualise the plan and build a mental model of what the work will look like.

The crucial part here is not to blindly agree to whatever the agent puts in front of you. Instead, ask the agent why it wants to go this way. The reasoning almost always reveals something — either the agent has misinterpreted something, or there’s a misunderstanding about the implementation plan. Either way, it’s a good place to align myself with the agent, and vice versa.

Tests are the source of truth

The technique I follow for development is TDD (for unit testing): tests before implementation, or tests with a stubbed implementation. The tests go in the RED commits, the implementation goes in the GREEN. I impose a strict rule: once the tests are written, they cannot be updated. The tests become the source of truth for the agent to follow. Sometimes I have another agent review the tests to make sure there are no gaps in coverage. Once committed, any change the agent makes to a test is visible — and if it touches them, I scrutinise it. (The exact phrasing of this rule lives in that CLAUDE.md.)

It’s common for the agent to forget proper mocks; those are the changes I allow in the test files. But the logical part cannot change unless the agent makes a strong case for why the change is necessary. If the reason is solid, I allow it. The agent tries to cut corners on long-running tasks. I pay attention to this and guide it back when it goes astray.

Pause and review

Once the PR is ready — or has reached a point where I should look at it — I pause the agent and review the work. It’s always better to pause on big PRs and check in before things go too far. This lets me correct the agent’s mistakes before anything major goes awry.

One way I do this is by having the agent write integration tests. If the feature has reached a point where a chunk of it can be tested, I tell the agent to write an integration test for it. I find integration tests the best way to test a feature in a sandbox: the main changes run against real data, while non-essential parts are mocked. If everything works as expected, I sign off and we move forward. Otherwise, the agent corrects the issues.

On PR size

I’m not a disciple of small PRs. I believe a PR can go big if the feature demands it. Breaking a feature PR into multiple smaller ones is more expensive than a single acceptably large PR — and the appetite for that has grown significantly with Claude Code. That said, it’s a personal preference, not something I strongly advocate.

End-to-end and manual QA

When the agent is done with the feature, I follow either or both of the following, depending on the feature:

  1. End-to-end tests — to stress-test the feature against different real-world scenarios and its interaction with other parts of the system.
  2. Manual QA — to test the feature in its natural habitat, either locally or on staging.

Once these are done, I have the agent raise a PR. With feedback from the reviews, I address the comments, update the tests, improve the E2Es, and run manual QA until the PR is accepted and merged.

Throughout, the agent keeps updating the working document. When all the todos are checked off, it marks the working document as complete.

Backup and sync across machines

I sync my Claude Code chats with CodeTeleport, a tool I built for this. It keeps my chats backed up — useful when I accidentally delete one — and lets me pick up the same conversation on a different machine, so I can work on the same repo with the same context wherever I am.

A caveat

This is an overview of how I generally develop, not a checklist I run every time. What I do and don’t do depends entirely on the feature. For bug fixes I follow a somewhat different approach that adds a debugging step — the rest of the process is similar.


P.S. My CLAUDE.md — the working-doc and TDD rules — lives in the agentic repo, alongside other agentic-development bits. I’ll keep adding to it as I go.