DataLow risk
Data Cleaning Loop
Clean messy datasets with repeatable validation and artifact outputs.
What this Loop Engineering template does
Produce a clean dataset and a reproducible cleaning script, documenting every transformation.
Never overwrite the raw data. Label any imputed values. Keep the script reproducible.
When to use it
Messy CSV or table cleanup
Schema normalization
Reproducible pipelines
When not to use it
Ambiguous labels needing human judgment
Data that must not be modified
Unverifiable sources
Validation checks
validation
✓Missing values are reported
✓Schema is documented
✓Cleaning script is reproducible
✓Final dataset passes validation checks
Boundaries & stop rule
!Do not silently delete rows
!Do not overwrite raw data
!Do not invent missing values without labeling imputation
Stop rule — Stop when the cleaned dataset passes validation and the script reproduces it. If labels are ambiguous, stop and ask for a human decision rather than guessing.
Copy the loop prompt
claude-goal.txt
/goal Produce a clean dataset and a reproducible cleaning script, documenting every transformation.Work toward this goal until all validation checks pass or the stop rule is reached.Loop cycle:1. Discovery — Read the latest signal for this template before acting: CI output, issue detail, review comment, dataset report, or content brief.2. Handoff — Hand the work to one agent in an isolated branch, worktree, or clearly scoped session. Keep final approval with a human.3. Verification — Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.4. Persistence — Save a short run note with the signal reviewed, actions taken, validation result, and next recommended step.5. Scheduling — Run manually until the loop is reliable; only then consider a scheduled or event-triggered run.Context:Never overwrite the raw data. Label any imputed values. Keep the script reproducible.Validation:Missing values are reportedSchema is documentedCleaning script is reproducibleFinal dataset passes validation checksIndependent checker:Use an independent review pass to confirm the result, inspect the diff or artifact, and reject shortcut work.Boundaries:Do not silently delete rowsDo not overwrite raw dataDo not invent missing values without labeling imputationStop rule:Stop when the cleaned dataset passes validation and the script reproduces it.Maximum iterations: 4Budget:Stop before exceeding the agreed per-run token budget.Human approval:Required before merge, deploy, delete, purchase, or external communication.Fallback:If labels are ambiguous, stop and ask for a human decision rather than guessing.Do not delete tests, bypass checks, or modify unrelated files just to satisfy the validation condition. If blocked, stop and summarize the blocker, attempted fixes, and recommended next action.
Failure modes to watch
Silent row deletion
Overwritten raw data
Unlabeled imputation
Non-reproducible cleanup
Loop Engineering FAQ
If a cleaning step turns out wrong, you need the original to start over. Cleaning should always produce a new artifact, never overwrite the source.