Can it skip a flaky test?

No — skipping or deleting tests is forbidden. Flaky infrastructure is out of scope; the loop stops and reports instead.

Templates/CI Fix

CI FixMedium risk

CI Failure Fix Loop

Fix failing CI by reading logs, applying minimal changes, and rerunning validation.

What this Loop Engineering template does

Make the failing CI checks pass with the smallest safe change, without weakening the test suite.

CI is red. Read the failing logs and reproduce locally before editing. Prefer the minimal change.

When to use it

Concrete, reproducible CI failures

Lint, type, or test breakages

Build errors

When not to use it

Flaky infrastructure

Secrets or permissions failures

Product decisions

Validation checks

validation

✓CI command passes locally

✓The same failing test now passes

✓No unrelated tests are skipped or deleted

Boundaries & stop rule

!Do not remove failing tests

!Do not disable lint or type checks

!Do not change public APIs unless required and documented

Stop rule — Stop when the failing checks pass locally or after 5 failed attempts. If blocked, summarize the failing check, the root-cause hypothesis, and the recommended human decision.

Copy the loop prompt

claude-goal.txt

/goal Make the failing CI checks pass with the smallest safe change, without weakening the test suite.
Work toward this goal until all validation checks pass or the stop rule is reached.
Loop cycle:
1. Discovery — Read the latest signal for this template before acting: CI output, issue detail, review comment, dataset report, or content brief.
2. Handoff — Hand the work to one agent in an isolated branch, worktree, or clearly scoped session. Keep final approval with a human.
3. Verification — Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.
4. Persistence — Save a short run note with the signal reviewed, actions taken, validation result, and next recommended step.
5. Scheduling — Run manually until the loop is reliable; only then consider a scheduled or event-triggered run.
Context:
CI is red. Read the failing logs and reproduce locally before editing. Prefer the minimal change.
Validation:
CI command passes locally
The same failing test now passes
No unrelated tests are skipped or deleted
Independent checker:
Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.
Boundaries:
Do not remove failing tests
Do not disable lint or type checks
Do not change public APIs unless required and documented
Stop rule:
Stop when the failing checks pass locally or after 5 failed attempts.
Maximum iterations: 5
Budget:
Stop before exceeding the agreed per-run token budget.
Human approval:
Required before merge, deploy, delete, purchase, or external communication.
Fallback:
If blocked, summarize the failing check, the root-cause hypothesis, and the recommended human decision.
Do not delete tests, bypass checks, or modify unrelated files just to satisfy the validation condition. If blocked, stop and summarize the blocker, attempted fixes, and recommended next action.

Failure modes to watch

Agent disables or deletes the failing test

Agent masks the error instead of fixing it

Agent changes unrelated modules

Agent loops on the same error

Loop Engineering FAQ

This loop targets one red CI run and the smallest fix. The PR Babysitter watches a whole pull request including review comments.