This commit is contained in:
2026-04-12 01:06:31 +07:00
commit 10d660cbcb
1066 changed files with 228596 additions and 0 deletions

View File

@@ -0,0 +1,193 @@
# Autonomous Loop Protocol
8-phase specification executed each iteration. Complete phases in order — no skipping.
---
## Phase 0: Precondition Checks (first iteration only)
Run once before the loop starts. Abort with clear error if any check fails.
1. Confirm current directory is a git repository (`git rev-parse --git-dir`)
2. Confirm working tree is clean (`git status --porcelain` → empty output)
3. Confirm current HEAD is on a named branch (not detached)
4. Check no stale lock files (`loop-results.tsv.lock`)
5. Resolve scope glob — confirm at least one file matches
6. Dry-run verify command — confirm it exits 0 and outputs a number
7. Dry-run guard command (if set) — confirm it exits 0
8. Record **baseline metric** as iteration 0 in `loop-results.tsv`
---
## Phase 1: Review
Read context before every iteration — do not skip even if "nothing changed".
```bash
git log --oneline -20 # recent history
git diff HEAD~1 # last change detail
cat loop-results.tsv # full results so far
```
Extract patterns:
- Which file types / functions yielded improvements?
- Which changes were consistently discarded?
- Is the metric trending, plateauing, or oscillating?
---
## Phase 2: Ideate
Pick **ONE** focused change. Rules:
- **Exploit** patterns from successful iterations
- **Avoid** repeating failed patterns (same file + same approach)
- **Atomicity test:** describe the change in one sentence. If the sentence contains "and", split into two iterations.
- Prefer high-leverage targets (files with low coverage, large bundle contributors, most lint errors)
- When stuck (3+ consecutive discards on same area), pivot to a different file or technique
---
## Phase 3: Modify
- Edit files within `Scope` only
- **Never** modify files referenced by the `Guard` command
- Ensure syntax is valid after edit (run `tsc --noEmit` or equivalent linter for the language)
- Keep changes minimal — one logical unit
---
## Phase 4: Commit
Commit **before** running verification. Git is the undo mechanism, not a post-hoc save.
```bash
git add <changed files>
git commit -m "loop(iter-N): <one-line description>"
```
Convention: `loop(iter-N):` prefix enables log filtering later.
---
## Phase 5: Verify
Run the configured verify command. Extract the numeric result.
```bash
RESULT=$(eval "$VERIFY_CMD")
DELTA=$(echo "$RESULT - $PREV_METRIC" | bc)
```
### Crash Recovery
| Outcome | Meaning | Action |
|---------|---------|--------|
| Exit 0, number printed | Success | Proceed to Phase 5.5 / 6 |
| Exit 0, no number | Bad command | Log `error:no-number`, revert, fix verify cmd |
| Exit non-zero | Verify crash | Log `error:verify-crash`, revert, treat as discard |
| Timeout (>30s) | Too slow | Log `error:timeout`, abort loop, surface to user |
---
## Phase 5.5: Guard (optional — skip if no Guard configured)
Run guard command after verify.
```bash
eval "$GUARD_CMD"
GUARD_EXIT=$?
```
| Guard Exit | Action |
|------------|--------|
| 0 (pass) | Proceed to Phase 6 |
| Non-zero (fail) | Revert commit, rework change (max 2 rework attempts), then discard |
If rework attempts exhausted: log as discarded with reason `guard-fail`, proceed to Phase 7.
---
## Phase 6: Decide
### Decision Matrix
| Metric Direction | Delta vs Min-Delta | Guard | Decision |
|------------------|--------------------|-------|----------|
| higher is better | delta ≥ Min-Delta | pass | **KEEP** |
| higher is better | delta < Min-Delta | pass | **DISCARD** (no progress) |
| lower is better | delta -Min-Delta | pass | **KEEP** |
| lower is better | delta > -Min-Delta | pass | **DISCARD** (no progress) |
| any | any | fail | **DISCARD** (guard fail) |
| any | verify crash | n/a | **DISCARD** (error) |
### Keep
- Update `PREV_METRIC` to current result
- Reset consecutive-discard counter to 0
### Discard
```bash
git revert HEAD --no-edit # preferred: preserves history
# fallback only if revert conflicts:
# git reset --hard HEAD~1
```
- Increment consecutive-discard counter
---
## Phase 7: Log
Append one TSV line to `loop-results.tsv`:
```
{iteration}\t{commit}\t{metric}\t{delta:+.2f}\t{status}\t{description}
```
Example:
```
3 c7d8e9f 84.7 +2.3 keep add branch coverage to tokenizer edge cases
4 - 84.7 +0.0 discard extract shared assertion helper
```
---
## Phase 8: Repeat or Stop
Continue if ALL conditions met:
- Iteration count < configured max
- Consecutive discards < 10
- User has not interrupted (check for `loop-stop` file or Ctrl-C signal)
### Stuck Detection
| Consecutive Discards | Action |
|----------------------|--------|
| 5 | Analyze `loop-results.tsv` for patterns shift strategy (different scope area, different technique) |
| 10 | **STOP** surface findings to user, recommend manual intervention |
### Final Report
When loop ends (limit reached, stuck, or interrupted):
```
Loop complete: N iterations, K kept, best metric: X (baseline: Y, delta: +Z)
Kept changes: [list commit hashes and descriptions]
Discarded: [count] iterations
Recommendation: [continue / diminishing returns / target met]
```
---
## Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|--------------|--------------|------------------|
| Multiple changes per iteration | Cannot attribute metric change to specific edit | One atomic change only |
| Verify before commit | No rollback point if verify crashes mid-run | Always commit first |
| Editing guard-scope files | Guard becomes meaningless if you edit what it checks | Guard files are read-only |
| `git reset` instead of `git revert` | Destroys history, breaks pattern analysis | Use `git revert` |
| Skipping Phase 1 review | Repeats failed patterns, wastes iterations | Always read log + diff |
| Ignoring `Min-Delta` | Micro-improvements cause noise, no real progress | Set meaningful threshold |

View File

@@ -0,0 +1,109 @@
# Git as Long-Term Memory
Git history is the loop's only persistent memory across iterations. Read it every time.
---
## Required Reads — Every Iteration
Run these at the start of Phase 1 (Review) without exception:
```bash
git log --oneline -20 # what changed and in what order
git diff HEAD~1 # exact diff of last iteration
cat loop-results.tsv # metric trend + keep/discard record
```
Together these answer three questions:
1. **What worked?** (kept=yes rows with positive delta)
2. **What failed?** (kept=no rows, repeated file paths)
3. **Where is the trend going?** (last 5 deltas — accelerating, flat, or reversing?)
---
## Pattern Recognition
### Exploit Successful Patterns
- Same file category that improved before → try adjacent files
- Same technique (e.g. adding edge-case tests) → apply to untouched functions
- Larger delta correlates with specific module → prioritize that module
### Avoid Failed Patterns
- File + technique combination that was discarded → do not retry same pair
- Zero-delta changes (e.g. refactors that don't move the metric) → skip unless required by guard
- Oscillating metric on a file → leave it, move elsewhere
### Detect Diminishing Returns
If last 5 kept iterations all have `delta < Min-Delta * 2`, the low-hanging fruit is gone. Signal:
- Broaden scope to adjacent files
- Switch technique entirely
- Report plateau to user rather than grinding
---
## Stuck Detection Integration
Track consecutive discards in a shell variable or temp file across phases:
```bash
CONSEC_DISCARDS=0 # reset on keep, increment on discard
# After Phase 6 decision:
if kept; then
CONSEC_DISCARDS=0
else
CONSEC_DISCARDS=$((CONSEC_DISCARDS + 1))
fi
# Phase 8 checks:
[ $CONSEC_DISCARDS -ge 5 ] && shift_strategy
[ $CONSEC_DISCARDS -ge 10 ] && stop_loop
```
---
## Revert vs Reset
Always prefer `git revert`. Only fall back to `git reset` when revert produces a conflict.
| Command | Preserves history | Safe for pattern analysis | Use when |
|---------|------------------|--------------------------|----------|
| `git revert HEAD --no-edit` | Yes | Yes | Default discard path |
| `git reset --hard HEAD~1` | No | No | Revert conflicts only |
Reason: `git log --grep="loop(iter-"` relies on intact history. A reset destroys the record of what was tried and silently breaks pattern analysis in future iterations.
---
## Commit Message Convention
```
loop(iter-N): <one-line description of the change>
```
Examples:
```
loop(iter-3): add null guard to parseToken in lexer.ts
loop(iter-7): split large test fixture into focused unit cases
loop(iter-12): remove unused lodash import reducing bundle 1.2kB
```
This convention enables targeted log queries:
```bash
# All loop commits
git log --oneline --grep="loop(iter-"
# Only kept changes (cross-reference with loop-results.tsv)
git log --oneline --grep="loop(iter-" | head -20
```
Reverted commits remain in history with the standard revert message:
```
Revert "loop(iter-4): ..."
```
This is intentional — discards are part of the experiment record.

View File

@@ -0,0 +1,140 @@
# Guard Pattern & Noise-Aware Verification
## Guard Pattern (Regression Prevention)
The verify command measures improvement. The guard command confirms nothing else broke.
**Separation of concerns:**
- Verify = "did the target metric improve?"
- Guard = "did anything else break?"
### How It Works
1. Baseline run: guard command must exit 0 before loop starts (establishes clean baseline)
2. After verify succeeds (Phase 5.5), run guard command — BEFORE the keep/discard decision
3. If guard exits non-zero: trigger recovery flow
### Guard Recovery Flow
```
Guard fails →
revert to previous commit →
rework attempt 1 (different approach) →
if guard fails again →
rework attempt 2 (minimal change) →
if guard fails again →
discard (log status: guard-failed)
```
**Rule:** If guard cannot pass at baseline, fix it before starting the loop — never relax the guard.
**Rule:** Guard files are READ-ONLY. Never modify test files, spec files, or guard scripts as part of an optimization attempt.
**Rule:** Guard failure means the optimization is wrong, not that the guard is wrong.
### Common Guard Commands
| Stack | Guard Command | Notes |
|-------|--------------|-------|
| Node.js | `npm test` | Runs Jest/Vitest suite |
| Python | `pytest` | Full test suite |
| Go | `go test ./...` | All packages |
| Rust | `cargo test` | Unit + integration |
| TypeScript | `tsc --noEmit && npm test` | Type check then tests |
| Any | `npm run lint && npm test` | Lint + test combined |
### Guard Command Selection Heuristic
- If optimizing runtime code → guard = full test suite
- If optimizing build/bundle → guard = `tsc --noEmit` + smoke test
- If optimizing ML pipeline → guard = test suite + data schema validation
- Default when unsure → `npm test` / `pytest` / `go test ./...`
---
## Noise-Aware Verification
Noisy metrics produce false positives. A "5% improvement" that's really measurement variance leads to keeping bad changes.
### Noise Levels
| Level | Description | Strategy |
|-------|-------------|----------|
| Low | Deterministic output (LOC, type errors, lint count) | Single run, trust result |
| Medium | Slight variance (build time ±5%, unit test timing) | 2 runs, use worse result |
| High | High variance (API latency, benchmark, ML accuracy) | 3-5 runs, use median |
### Multi-Run Median (High Noise)
```
runs = []
repeat 3-5 times:
result = run verify command
runs.append(result)
metric = median(runs)
```
Use median, not mean — median is resistant to single outlier spikes.
### Min-Delta Threshold
Only keep an attempt if improvement exceeds the threshold:
```
improvement = previous_best - new_metric # for "lower is better"
if improvement < min_delta:
status = no-op # do not keep, but not a failure
```
**Default thresholds by noise level:**
- Low noise: 0 (any improvement counts)
- Medium noise: 1-2% of baseline
- High noise: 3-5% of baseline
### Confirmation Run
For high-stakes metrics (final 3 iterations, or improvement > 20%), re-verify before committing:
```
candidate looks good →
run verify one more time →
compare to initial measurement this iteration →
if within 2% → confirm keep
if outside 2% → treat as medium noise, average the two
```
### Environment Pinning (User Responsibility)
ck:loop cannot control the environment. User must ensure:
- Fixed random seeds for ML workloads
- Warmed caches (or cold caches) consistently
- No background processes competing for CPU
- Same input data across runs
### Config Examples
**Low noise (lint errors):**
```
verify: eslint src --format json | jq '[.[] | .errorCount] | add'
noise: low
min_delta: 0
guard: npm test
```
**Medium noise (build time):**
```
verify: { start=$(date +%s%N); npm run build; echo $(( ($(date +%s%N) - start) / 1000000 )); }
noise: medium
runs: 2
min_delta: 200 # ms
guard: tsc --noEmit
```
**High noise (API latency):**
```
verify: wrk -t2 -c10 -d10s http://localhost:3000/api/health | grep 'Latency' | awk '{print $2}' | sed 's/ms//'
noise: high
runs: 5
min_delta: 5 # ms
guard: npm test
```

View File

@@ -0,0 +1,200 @@
# Metric Library
Quick-reference verify commands by domain. Copy-paste into ck:loop config.
Direction: **lower** = fewer errors/ms/bytes is better. **higher** = more coverage/accuracy is better.
## Code Quality
### Test Coverage
**Node.js — Jest**
```bash
npx jest --coverage --coverageReporters=json-summary 2>/dev/null \
| node -e "const s=require('./coverage/coverage-summary.json'); console.log(s.total.lines.pct)"
```
Direction: higher | Noise: low | Guard: `npm test`
**Node.js — Vitest**
```bash
npx vitest run --coverage 2>/dev/null \
| grep 'All files' | awk '{print $NF}' | tr -d '%'
```
Direction: higher | Noise: low | Guard: `npm test`
**Python — pytest-cov**
```bash
pytest --cov=src --cov-report=term-missing -q 2>/dev/null \
| grep 'TOTAL' | awk '{print $NF}' | tr -d '%'
```
Direction: higher | Noise: low | Guard: `pytest`
**Go**
```bash
go test ./... -coverprofile=coverage.out -covermode=atomic 2>/dev/null \
&& go tool cover -func=coverage.out | grep total | awk '{print $3}' | tr -d '%'
```
Direction: higher | Noise: low | Guard: `go test ./...`
### Lint Errors
**ESLint**
```bash
npx eslint src --format json 2>/dev/null \
| node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); console.log(r.reduce((a,f)=>a+f.errorCount,0))"
```
Direction: lower | Noise: low | Guard: `npm test`
**Pylint**
```bash
pylint src/ --output-format=json 2>/dev/null \
| python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for m in d if m['type'] in ('error','fatal')))"
```
Direction: lower | Noise: low | Guard: `pytest`
**Clippy (Rust)**
```bash
cargo clippy --message-format=json 2>/dev/null \
| jq -r 'select(.reason=="compiler-message") | .message.level' | grep -c 'error'
```
Direction: lower | Noise: low | Guard: `cargo test`
### Type Errors
**TypeScript — tsc**
```bash
npx tsc --noEmit 2>&1 | grep -c '^src/.*error TS' || true
```
Direction: lower | Noise: low | Guard: `npm test`
**Python — mypy**
```bash
mypy src/ --ignore-missing-imports 2>&1 | tail -1 | awk '{print $1}'
```
Direction: lower | Noise: low | Guard: `pytest`
## Performance
### API Latency
**wrk (mean latency, ms)**
```bash
wrk -t2 -c10 -d10s http://localhost:3000/api/health 2>/dev/null \
| grep 'Latency' | awk '{print $2}' | sed 's/ms//'
```
Direction: lower | Noise: high | Guard: `npm test`
**curl (single request, ms)**
```bash
curl -o /dev/null -s -w "%{time_total}" http://localhost:3000/api/health \
| awk '{printf "%.0f\n", $1*1000}'
```
Direction: lower | Noise: high | Guard: `npm test`
### Build / Bundle Size
**Webpack / Vite (main bundle, bytes)**
```bash
npm run build 2>/dev/null \
&& find dist -name '*.js' ! -name '*.map' | xargs wc -c | tail -1 | awk '{print $1}'
```
Direction: lower | Noise: low | Guard: `tsc --noEmit`
**Go binary (bytes)**
```bash
go build -o /tmp/app_measure . 2>/dev/null && wc -c < /tmp/app_measure
```
Direction: lower | Noise: low | Guard: `go test ./...`
### Build Time
**Node.js (ms)**
```bash
start=$(date +%s%N); npm run build 2>/dev/null; echo $(( ($(date +%s%N) - start) / 1000000 ))
```
Direction: lower | Noise: medium | Guard: `tsc --noEmit`
**Go (ms)**
```bash
start=$(date +%s%N); go build ./... 2>/dev/null; echo $(( ($(date +%s%N) - start) / 1000000 ))
```
Direction: lower | Noise: medium | Guard: `go test ./...`
## Security
### Vulnerability Count
**npm audit**
```bash
npm audit --json 2>/dev/null \
| node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); console.log(r.metadata?.vulnerabilities?.total ?? 0)"
```
Direction: lower | Noise: low | Guard: `npm test`
**pip-audit**
```bash
pip-audit --format=json 2>/dev/null \
| python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d.get('dependencies',[])))"
```
Direction: lower | Noise: low | Guard: `pytest`
## Lines of Code
**find + wc (TS/JS)**
```bash
find src -name '*.ts' -o -name '*.js' | xargs wc -l | tail -1 | awk '{print $1}'
```
Direction: lower | Noise: low | Guard: `npm test`
**cloc (any language)**
```bash
cloc src --json 2>/dev/null | python3 -c "import json,sys; print(json.load(sys.stdin)['SUM']['code'])"
```
Direction: lower | Noise: low | Guard: `npm test`
## ML / Data Science
### Accuracy
**PyTorch (eval script)**
```bash
python3 scripts/evaluate.py --split val 2>/dev/null | grep 'accuracy' | awk '{print $NF}'
```
Direction: higher | Noise: high | Guard: `pytest tests/`
**sklearn — F1 Score**
```bash
python3 -c "from sklearn.metrics import f1_score; import numpy as np; print(f'{f1_score(np.load(\"data/y_true.npy\"), np.load(\"data/y_pred.npy\"), average=\"weighted\"):.4f}')"
```
Direction: higher | Noise: high | Guard: `pytest tests/`
## Creating Custom Metrics
### Template
```bash
# 1. Measure exactly one numeric value
# 2. Print it to stdout as the last (or only) line
# 3. Exit 0 on success, non-zero on failure (treated as crash)
# 4. Complete in < 30 seconds (or configure timeout)
# 5. Be deterministic, or declare Noise: high
YOUR_MEASURE_COMMAND | YOUR_EXTRACT_COMMAND
```
### Rules
| Rule | Detail |
|------|--------|
| One number | stdout last line must be a bare number (integer or float) |
| Exit codes | exit 0 = valid measurement, exit non-zero = crash (logged, skipped) |
| Runtime | keep under 30s; use sampling for expensive workloads |
| Determinism | if output varies run-to-run, set `noise: high` and use 3-5 runs |
| Units | consistent across all iterations; never change mid-loop |
| Direction | declare explicitly: `lower` or `higher` is better |

View File

@@ -0,0 +1,74 @@
# Results Logging
## TSV Format
One row per iteration. Tab-separated. Header row required.
```
iteration commit metric delta status description
```
### Column Definitions
| Column | Type | Notes |
|--------|------|-------|
| iteration | integer | 0-indexed. 0 = baseline. |
| commit | string | Short SHA (7 chars) or `-` if discarded/crashed |
| metric | float | Measured value from verify command |
| delta | float | Signed change from previous best. Negative = improvement for "lower is better". `-` for baseline. |
| status | enum | See status values below |
| description | string | One sentence: what was attempted |
### Status Values
| Status | Meaning |
|--------|---------|
| `baseline` | Initial measurement before any changes |
| `keep` | Improvement passed guard, committed |
| `keep (reworked)` | Failed guard on first attempt, reworked, then passed |
| `discard` | No improvement or below min-delta threshold |
| `guard-failed` | Metric improved but guard command exited non-zero; reverted |
| `crash` | Verify command errored or timed out |
| `no-op` | Improvement below min-delta threshold (not a failure, just insufficient) |
### Example Log
```tsv
iteration commit metric delta status description
0 a1b2c3d 842 - baseline Initial bundle size measurement
1 e4f5a6b 810 -32 keep Tree-shake unused lodash imports
2 - 798 -44 discard Remove dead CSS — metric improved but below min-delta
3 c7d8e9f 771 -71 keep Replace moment.js with day.js
4 - - - crash Build script errored on dynamic import rewrite
5 1a2b3c4 751 -91 guard-failed Inline critical CSS — bundle smaller but tests failed
6 5d6e7f8 758 -84 keep (reworked) Inline critical CSS with fallback (guard-safe version)
7 9a0b1c2 741 -101 keep Lazy-load admin panel chunk
```
---
## Progressive Summaries
### Every-5-Iteration Summary
Print after iterations 5, 10, 15, ...:
```
--- Progress @ iteration 5 ---
Best so far: 751 (baseline: 842, -10.8%)
Kept: 3 | Discarded: 1 | Crashed: 1 | Guard-failed: 1
Top strategy: dependency replacement (moment→day.js: -71)
```
### Final Summary
Print at loop end (budget exhausted or goal reached):
```
--- Final Summary ---
Baseline → Final: 842 → 741 (-11.9%, -101 units)
Iterations: 7 total | Kept: 4 | Discarded: 1 | Crashed: 1 | Guard-failed: 1
Best single iteration: #7 lazy-load admin chunk (-20)
Worst outcome: #4 crash (build script)
Key insight: Dependency replacement yielded most gains; CSS inlining required guard-safe rework
```