renolation/english

Fork 0

Files

renolation 10d660cbcb init

2026-04-12 01:06:31 +07:00

4.0 KiB

Raw Permalink Blame History

Guard Pattern & Noise-Aware Verification

Guard Pattern (Regression Prevention)

The verify command measures improvement. The guard command confirms nothing else broke.

Separation of concerns:

Verify = "did the target metric improve?"
Guard = "did anything else break?"

How It Works

Baseline run: guard command must exit 0 before loop starts (establishes clean baseline)
After verify succeeds (Phase 5.5), run guard command — BEFORE the keep/discard decision
If guard exits non-zero: trigger recovery flow

Guard Recovery Flow

Guard fails →
  revert to previous commit →
  rework attempt 1 (different approach) →
    if guard fails again →
  rework attempt 2 (minimal change) →
    if guard fails again →
  discard (log status: guard-failed)

Rule: If guard cannot pass at baseline, fix it before starting the loop — never relax the guard.

Rule: Guard files are READ-ONLY. Never modify test files, spec files, or guard scripts as part of an optimization attempt.

Rule: Guard failure means the optimization is wrong, not that the guard is wrong.

Common Guard Commands

Stack	Guard Command	Notes
Node.js	`npm test`	Runs Jest/Vitest suite
Python	`pytest`	Full test suite
Go	`go test ./...`	All packages
Rust	`cargo test`	Unit + integration
TypeScript	`tsc --noEmit && npm test`	Type check then tests
Any	`npm run lint && npm test`	Lint + test combined

Guard Command Selection Heuristic

If optimizing runtime code → guard = full test suite
If optimizing build/bundle → guard = tsc --noEmit + smoke test
If optimizing ML pipeline → guard = test suite + data schema validation
Default when unsure → npm test / pytest / go test ./...

Noise-Aware Verification

Noisy metrics produce false positives. A "5% improvement" that's really measurement variance leads to keeping bad changes.

Noise Levels

Level	Description	Strategy
Low	Deterministic output (LOC, type errors, lint count)	Single run, trust result
Medium	Slight variance (build time ±5%, unit test timing)	2 runs, use worse result
High	High variance (API latency, benchmark, ML accuracy)	3-5 runs, use median

Multi-Run Median (High Noise)

runs = []
repeat 3-5 times:
  result = run verify command
  runs.append(result)
metric = median(runs)

Use median, not mean — median is resistant to single outlier spikes.

Min-Delta Threshold

Only keep an attempt if improvement exceeds the threshold:

improvement = previous_best - new_metric   # for "lower is better"
if improvement < min_delta:
  status = no-op   # do not keep, but not a failure

Default thresholds by noise level:

Low noise: 0 (any improvement counts)
Medium noise: 1-2% of baseline
High noise: 3-5% of baseline

Confirmation Run

For high-stakes metrics (final 3 iterations, or improvement > 20%), re-verify before committing:

candidate looks good →
  run verify one more time →
  compare to initial measurement this iteration →
  if within 2% → confirm keep
  if outside 2% → treat as medium noise, average the two

Environment Pinning (User Responsibility)

ck:loop cannot control the environment. User must ensure:

Fixed random seeds for ML workloads
Warmed caches (or cold caches) consistently
No background processes competing for CPU
Same input data across runs

Config Examples

Low noise (lint errors):

verify: eslint src --format json | jq '[.[] | .errorCount] | add'
noise: low
min_delta: 0
guard: npm test

Medium noise (build time):

verify: { start=$(date +%s%N); npm run build; echo $(( ($(date +%s%N) - start) / 1000000 )); }
noise: medium
runs: 2
min_delta: 200   # ms
guard: tsc --noEmit

High noise (API latency):

verify: wrk -t2 -c10 -d10s http://localhost:3000/api/health | grep 'Latency' | awk '{print $2}' | sed 's/ms//'
noise: high
runs: 5
min_delta: 5   # ms
guard: npm test

4.0 KiB Raw Permalink Blame History