What about missing values?

Report them, and if you impute, label the imputation. Silently filling gaps hides uncertainty and can corrupt downstream analysis.

Templates/Data

DataLow risk

Data Cleaning Loop

Clean messy datasets with repeatable validation and artifact outputs.

What this Loop Engineering template does

Produce a clean dataset and a reproducible cleaning script, documenting every transformation.

Never overwrite the raw data. Label any imputed values. Keep the script reproducible.

When to use it

Messy CSV or table cleanup

Schema normalization

Reproducible pipelines

When not to use it

Ambiguous labels needing human judgment

Data that must not be modified

Unverifiable sources

Validation checks

validation

✓Missing values are reported

✓Schema is documented

✓Cleaning script is reproducible

✓Final dataset passes validation checks

Boundaries & stop rule

!Do not silently delete rows

!Do not overwrite raw data

!Do not invent missing values without labeling imputation

Stop rule — Stop when the cleaned dataset passes validation and the script reproduces it. If labels are ambiguous, stop and ask for a human decision rather than guessing.

Copy the loop prompt

claude-goal.txt

/goal Produce a clean dataset and a reproducible cleaning script, documenting every transformation.
Work toward this goal until all validation checks pass or the stop rule is reached.
Loop cycle:
1. Discovery — Read the latest signal for this template before acting: CI output, issue detail, review comment, dataset report, or content brief.
2. Handoff — Hand the work to one agent in an isolated branch, worktree, or clearly scoped session. Keep final approval with a human.
3. Verification — Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.
4. Persistence — Save a short run note with the signal reviewed, actions taken, validation result, and next recommended step.
5. Scheduling — Run manually until the loop is reliable; only then consider a scheduled or event-triggered run.
Context:
Never overwrite the raw data. Label any imputed values. Keep the script reproducible.
Validation:
Missing values are reported
Schema is documented
Cleaning script is reproducible
Final dataset passes validation checks
Independent checker:
Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.
Boundaries:
Do not silently delete rows
Do not overwrite raw data
Do not invent missing values without labeling imputation
Stop rule:
Stop when the cleaned dataset passes validation and the script reproduces it.
Maximum iterations: 4
Budget:
Stop before exceeding the agreed per-run token budget.
Human approval:
Required before merge, deploy, delete, purchase, or external communication.
Fallback:
If labels are ambiguous, stop and ask for a human decision rather than guessing.
Do not delete tests, bypass checks, or modify unrelated files just to satisfy the validation condition. If blocked, stop and summarize the blocker, attempted fixes, and recommended next action.

Failure modes to watch

Silent row deletion

Overwritten raw data

Unlabeled imputation

Non-reproducible cleanup

Loop Engineering FAQ

If a cleaning step turns out wrong, you need the original to start over. Cleaning should always produce a new artifact, never overwrite the source.