Why small steps instead of one rewrite?

Small, test-verified steps stay reviewable and revertible. One large rewrite is hard to review and hard to bisect when something breaks.

Templates/Refactor

RefactorMedium risk

Refactor Loop

Refactor a module in small, test-verified steps without changing its external behavior.

What this Loop Engineering template does

Refactor the target module in small steps, keeping external behavior identical and the test suite green after each step.

Behavior must not change. Run the tests after each step. Prefer many small, reversible changes over one large rewrite — for example, splitting a large file into focused modules until each is under a size budget.

When to use it

Splitting a large file into focused modules

Improving structure without behavior change

Incremental, test-backed refactors

When not to use it

Refactors without a test safety net

Mixing a refactor with new features

Validation checks

validation

✓Full test suite passes after each step

✓Public API and external behavior are unchanged

✓No file exceeds the agreed size budget

✓Type checks pass

Boundaries & stop rule

!Do not change external behavior or public APIs

!Do not mix in new features

!Do not delete tests

!Do not refactor unrelated modules

Stop rule — Stop when the refactor goal is met and tests are green, or after 5 failed steps. With Claude /goal, bound it by adding “or stop after N turns” to the condition. If a step breaks behavior, revert it, summarize what broke, and recommend a smaller step or a human decision.

Copy the loop prompt

claude-goal.txt

/goal Refactor the target module in small steps, keeping external behavior identical and the test suite green after each step.
Work toward this goal until all validation checks pass or the stop rule is reached.
Loop cycle:
1. Discovery — Read the latest signal for this template before acting: CI output, issue detail, review comment, dataset report, or content brief.
2. Handoff — Hand the work to one agent in an isolated branch, worktree, or clearly scoped session. Keep final approval with a human.
3. Verification — Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.
4. Persistence — Save a short run note with the signal reviewed, actions taken, validation result, and next recommended step.
5. Scheduling — Run manually until the loop is reliable; only then consider a scheduled or event-triggered run.
Context:
Behavior must not change. Run the tests after each step. Prefer many small, reversible changes over one large rewrite — for example, splitting a large file into focused modules until each is under a size budget.
Validation:
Full test suite passes after each step
Public API and external behavior are unchanged
No file exceeds the agreed size budget
Type checks pass
Independent checker:
Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.
Boundaries:
Do not change external behavior or public APIs
Do not mix in new features
Do not delete tests
Do not refactor unrelated modules
Stop rule:
Stop when the refactor goal is met and tests are green, or after 5 failed steps. With Claude /goal, bound it by adding “or stop after N turns” to the condition.
Maximum iterations: 6
Budget:
Stop before exceeding the agreed per-run token budget.
Human approval:
Required before merge, deploy, delete, purchase, or external communication.
Fallback:
If a step breaks behavior, revert it, summarize what broke, and recommend a smaller step or a human decision.
Do not delete tests, bypass checks, or modify unrelated files just to satisfy the validation condition. If blocked, stop and summarize the blocker, attempted fixes, and recommended next action.

Failure modes to watch

Changing behavior while “just refactoring”

One giant rewrite that is hard to review

Dropping tests to make a step pass

Scope creep into unrelated code

Loop Engineering FAQ

A refactor must not change behavior — the tests that pass before must still pass after. A bug fix deliberately changes behavior and proves it with a new regression test.