init
This commit is contained in:
@@ -0,0 +1,193 @@
|
||||
# Autonomous Loop Protocol
|
||||
|
||||
8-phase specification executed each iteration. Complete phases in order — no skipping.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: Precondition Checks (first iteration only)
|
||||
|
||||
Run once before the loop starts. Abort with clear error if any check fails.
|
||||
|
||||
1. Confirm current directory is a git repository (`git rev-parse --git-dir`)
|
||||
2. Confirm working tree is clean (`git status --porcelain` → empty output)
|
||||
3. Confirm current HEAD is on a named branch (not detached)
|
||||
4. Check no stale lock files (`loop-results.tsv.lock`)
|
||||
5. Resolve scope glob — confirm at least one file matches
|
||||
6. Dry-run verify command — confirm it exits 0 and outputs a number
|
||||
7. Dry-run guard command (if set) — confirm it exits 0
|
||||
8. Record **baseline metric** as iteration 0 in `loop-results.tsv`
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Review
|
||||
|
||||
Read context before every iteration — do not skip even if "nothing changed".
|
||||
|
||||
```bash
|
||||
git log --oneline -20 # recent history
|
||||
git diff HEAD~1 # last change detail
|
||||
cat loop-results.tsv # full results so far
|
||||
```
|
||||
|
||||
Extract patterns:
|
||||
- Which file types / functions yielded improvements?
|
||||
- Which changes were consistently discarded?
|
||||
- Is the metric trending, plateauing, or oscillating?
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Ideate
|
||||
|
||||
Pick **ONE** focused change. Rules:
|
||||
|
||||
- **Exploit** patterns from successful iterations
|
||||
- **Avoid** repeating failed patterns (same file + same approach)
|
||||
- **Atomicity test:** describe the change in one sentence. If the sentence contains "and", split into two iterations.
|
||||
- Prefer high-leverage targets (files with low coverage, large bundle contributors, most lint errors)
|
||||
- When stuck (3+ consecutive discards on same area), pivot to a different file or technique
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Modify
|
||||
|
||||
- Edit files within `Scope` only
|
||||
- **Never** modify files referenced by the `Guard` command
|
||||
- Ensure syntax is valid after edit (run `tsc --noEmit` or equivalent linter for the language)
|
||||
- Keep changes minimal — one logical unit
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Commit
|
||||
|
||||
Commit **before** running verification. Git is the undo mechanism, not a post-hoc save.
|
||||
|
||||
```bash
|
||||
git add <changed files>
|
||||
git commit -m "loop(iter-N): <one-line description>"
|
||||
```
|
||||
|
||||
Convention: `loop(iter-N):` prefix enables log filtering later.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Verify
|
||||
|
||||
Run the configured verify command. Extract the numeric result.
|
||||
|
||||
```bash
|
||||
RESULT=$(eval "$VERIFY_CMD")
|
||||
DELTA=$(echo "$RESULT - $PREV_METRIC" | bc)
|
||||
```
|
||||
|
||||
### Crash Recovery
|
||||
|
||||
| Outcome | Meaning | Action |
|
||||
|---------|---------|--------|
|
||||
| Exit 0, number printed | Success | Proceed to Phase 5.5 / 6 |
|
||||
| Exit 0, no number | Bad command | Log `error:no-number`, revert, fix verify cmd |
|
||||
| Exit non-zero | Verify crash | Log `error:verify-crash`, revert, treat as discard |
|
||||
| Timeout (>30s) | Too slow | Log `error:timeout`, abort loop, surface to user |
|
||||
|
||||
---
|
||||
|
||||
## Phase 5.5: Guard (optional — skip if no Guard configured)
|
||||
|
||||
Run guard command after verify.
|
||||
|
||||
```bash
|
||||
eval "$GUARD_CMD"
|
||||
GUARD_EXIT=$?
|
||||
```
|
||||
|
||||
| Guard Exit | Action |
|
||||
|------------|--------|
|
||||
| 0 (pass) | Proceed to Phase 6 |
|
||||
| Non-zero (fail) | Revert commit, rework change (max 2 rework attempts), then discard |
|
||||
|
||||
If rework attempts exhausted: log as discarded with reason `guard-fail`, proceed to Phase 7.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Decide
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Metric Direction | Delta vs Min-Delta | Guard | Decision |
|
||||
|------------------|--------------------|-------|----------|
|
||||
| higher is better | delta ≥ Min-Delta | pass | **KEEP** |
|
||||
| higher is better | delta < Min-Delta | pass | **DISCARD** (no progress) |
|
||||
| lower is better | delta ≤ -Min-Delta | pass | **KEEP** |
|
||||
| lower is better | delta > -Min-Delta | pass | **DISCARD** (no progress) |
|
||||
| any | any | fail | **DISCARD** (guard fail) |
|
||||
| any | verify crash | n/a | **DISCARD** (error) |
|
||||
|
||||
### Keep
|
||||
|
||||
- Update `PREV_METRIC` to current result
|
||||
- Reset consecutive-discard counter to 0
|
||||
|
||||
### Discard
|
||||
|
||||
```bash
|
||||
git revert HEAD --no-edit # preferred: preserves history
|
||||
# fallback only if revert conflicts:
|
||||
# git reset --hard HEAD~1
|
||||
```
|
||||
|
||||
- Increment consecutive-discard counter
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Log
|
||||
|
||||
Append one TSV line to `loop-results.tsv`:
|
||||
|
||||
```
|
||||
{iteration}\t{commit}\t{metric}\t{delta:+.2f}\t{status}\t{description}
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
3 c7d8e9f 84.7 +2.3 keep add branch coverage to tokenizer edge cases
|
||||
4 - 84.7 +0.0 discard extract shared assertion helper
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Repeat or Stop
|
||||
|
||||
Continue if ALL conditions met:
|
||||
- Iteration count < configured max
|
||||
- Consecutive discards < 10
|
||||
- User has not interrupted (check for `loop-stop` file or Ctrl-C signal)
|
||||
|
||||
### Stuck Detection
|
||||
|
||||
| Consecutive Discards | Action |
|
||||
|----------------------|--------|
|
||||
| 5 | Analyze `loop-results.tsv` for patterns → shift strategy (different scope area, different technique) |
|
||||
| 10 | **STOP** — surface findings to user, recommend manual intervention |
|
||||
|
||||
### Final Report
|
||||
|
||||
When loop ends (limit reached, stuck, or interrupted):
|
||||
|
||||
```
|
||||
Loop complete: N iterations, K kept, best metric: X (baseline: Y, delta: +Z)
|
||||
Kept changes: [list commit hashes and descriptions]
|
||||
Discarded: [count] iterations
|
||||
Recommendation: [continue / diminishing returns / target met]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Why It Fails | Correct Approach |
|
||||
|--------------|--------------|------------------|
|
||||
| Multiple changes per iteration | Cannot attribute metric change to specific edit | One atomic change only |
|
||||
| Verify before commit | No rollback point if verify crashes mid-run | Always commit first |
|
||||
| Editing guard-scope files | Guard becomes meaningless if you edit what it checks | Guard files are read-only |
|
||||
| `git reset` instead of `git revert` | Destroys history, breaks pattern analysis | Use `git revert` |
|
||||
| Skipping Phase 1 review | Repeats failed patterns, wastes iterations | Always read log + diff |
|
||||
| Ignoring `Min-Delta` | Micro-improvements cause noise, no real progress | Set meaningful threshold |
|
||||
@@ -0,0 +1,109 @@
|
||||
# Git as Long-Term Memory
|
||||
|
||||
Git history is the loop's only persistent memory across iterations. Read it every time.
|
||||
|
||||
---
|
||||
|
||||
## Required Reads — Every Iteration
|
||||
|
||||
Run these at the start of Phase 1 (Review) without exception:
|
||||
|
||||
```bash
|
||||
git log --oneline -20 # what changed and in what order
|
||||
git diff HEAD~1 # exact diff of last iteration
|
||||
cat loop-results.tsv # metric trend + keep/discard record
|
||||
```
|
||||
|
||||
Together these answer three questions:
|
||||
1. **What worked?** (kept=yes rows with positive delta)
|
||||
2. **What failed?** (kept=no rows, repeated file paths)
|
||||
3. **Where is the trend going?** (last 5 deltas — accelerating, flat, or reversing?)
|
||||
|
||||
---
|
||||
|
||||
## Pattern Recognition
|
||||
|
||||
### Exploit Successful Patterns
|
||||
|
||||
- Same file category that improved before → try adjacent files
|
||||
- Same technique (e.g. adding edge-case tests) → apply to untouched functions
|
||||
- Larger delta correlates with specific module → prioritize that module
|
||||
|
||||
### Avoid Failed Patterns
|
||||
|
||||
- File + technique combination that was discarded → do not retry same pair
|
||||
- Zero-delta changes (e.g. refactors that don't move the metric) → skip unless required by guard
|
||||
- Oscillating metric on a file → leave it, move elsewhere
|
||||
|
||||
### Detect Diminishing Returns
|
||||
|
||||
If last 5 kept iterations all have `delta < Min-Delta * 2`, the low-hanging fruit is gone. Signal:
|
||||
- Broaden scope to adjacent files
|
||||
- Switch technique entirely
|
||||
- Report plateau to user rather than grinding
|
||||
|
||||
---
|
||||
|
||||
## Stuck Detection Integration
|
||||
|
||||
Track consecutive discards in a shell variable or temp file across phases:
|
||||
|
||||
```bash
|
||||
CONSEC_DISCARDS=0 # reset on keep, increment on discard
|
||||
|
||||
# After Phase 6 decision:
|
||||
if kept; then
|
||||
CONSEC_DISCARDS=0
|
||||
else
|
||||
CONSEC_DISCARDS=$((CONSEC_DISCARDS + 1))
|
||||
fi
|
||||
|
||||
# Phase 8 checks:
|
||||
[ $CONSEC_DISCARDS -ge 5 ] && shift_strategy
|
||||
[ $CONSEC_DISCARDS -ge 10 ] && stop_loop
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Revert vs Reset
|
||||
|
||||
Always prefer `git revert`. Only fall back to `git reset` when revert produces a conflict.
|
||||
|
||||
| Command | Preserves history | Safe for pattern analysis | Use when |
|
||||
|---------|------------------|--------------------------|----------|
|
||||
| `git revert HEAD --no-edit` | Yes | Yes | Default discard path |
|
||||
| `git reset --hard HEAD~1` | No | No | Revert conflicts only |
|
||||
|
||||
Reason: `git log --grep="loop(iter-"` relies on intact history. A reset destroys the record of what was tried and silently breaks pattern analysis in future iterations.
|
||||
|
||||
---
|
||||
|
||||
## Commit Message Convention
|
||||
|
||||
```
|
||||
loop(iter-N): <one-line description of the change>
|
||||
```
|
||||
|
||||
Examples:
|
||||
```
|
||||
loop(iter-3): add null guard to parseToken in lexer.ts
|
||||
loop(iter-7): split large test fixture into focused unit cases
|
||||
loop(iter-12): remove unused lodash import reducing bundle 1.2kB
|
||||
```
|
||||
|
||||
This convention enables targeted log queries:
|
||||
|
||||
```bash
|
||||
# All loop commits
|
||||
git log --oneline --grep="loop(iter-"
|
||||
|
||||
# Only kept changes (cross-reference with loop-results.tsv)
|
||||
git log --oneline --grep="loop(iter-" | head -20
|
||||
```
|
||||
|
||||
Reverted commits remain in history with the standard revert message:
|
||||
```
|
||||
Revert "loop(iter-4): ..."
|
||||
```
|
||||
|
||||
This is intentional — discards are part of the experiment record.
|
||||
140
.opencode/skills/ck-autoresearch/references/guard-and-noise.md
Normal file
140
.opencode/skills/ck-autoresearch/references/guard-and-noise.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Guard Pattern & Noise-Aware Verification
|
||||
|
||||
## Guard Pattern (Regression Prevention)
|
||||
|
||||
The verify command measures improvement. The guard command confirms nothing else broke.
|
||||
|
||||
**Separation of concerns:**
|
||||
- Verify = "did the target metric improve?"
|
||||
- Guard = "did anything else break?"
|
||||
|
||||
### How It Works
|
||||
|
||||
1. Baseline run: guard command must exit 0 before loop starts (establishes clean baseline)
|
||||
2. After verify succeeds (Phase 5.5), run guard command — BEFORE the keep/discard decision
|
||||
3. If guard exits non-zero: trigger recovery flow
|
||||
|
||||
### Guard Recovery Flow
|
||||
|
||||
```
|
||||
Guard fails →
|
||||
revert to previous commit →
|
||||
rework attempt 1 (different approach) →
|
||||
if guard fails again →
|
||||
rework attempt 2 (minimal change) →
|
||||
if guard fails again →
|
||||
discard (log status: guard-failed)
|
||||
```
|
||||
|
||||
**Rule:** If guard cannot pass at baseline, fix it before starting the loop — never relax the guard.
|
||||
|
||||
**Rule:** Guard files are READ-ONLY. Never modify test files, spec files, or guard scripts as part of an optimization attempt.
|
||||
|
||||
**Rule:** Guard failure means the optimization is wrong, not that the guard is wrong.
|
||||
|
||||
### Common Guard Commands
|
||||
|
||||
| Stack | Guard Command | Notes |
|
||||
|-------|--------------|-------|
|
||||
| Node.js | `npm test` | Runs Jest/Vitest suite |
|
||||
| Python | `pytest` | Full test suite |
|
||||
| Go | `go test ./...` | All packages |
|
||||
| Rust | `cargo test` | Unit + integration |
|
||||
| TypeScript | `tsc --noEmit && npm test` | Type check then tests |
|
||||
| Any | `npm run lint && npm test` | Lint + test combined |
|
||||
|
||||
### Guard Command Selection Heuristic
|
||||
|
||||
- If optimizing runtime code → guard = full test suite
|
||||
- If optimizing build/bundle → guard = `tsc --noEmit` + smoke test
|
||||
- If optimizing ML pipeline → guard = test suite + data schema validation
|
||||
- Default when unsure → `npm test` / `pytest` / `go test ./...`
|
||||
|
||||
---
|
||||
|
||||
## Noise-Aware Verification
|
||||
|
||||
Noisy metrics produce false positives. A "5% improvement" that's really measurement variance leads to keeping bad changes.
|
||||
|
||||
### Noise Levels
|
||||
|
||||
| Level | Description | Strategy |
|
||||
|-------|-------------|----------|
|
||||
| Low | Deterministic output (LOC, type errors, lint count) | Single run, trust result |
|
||||
| Medium | Slight variance (build time ±5%, unit test timing) | 2 runs, use worse result |
|
||||
| High | High variance (API latency, benchmark, ML accuracy) | 3-5 runs, use median |
|
||||
|
||||
### Multi-Run Median (High Noise)
|
||||
|
||||
```
|
||||
runs = []
|
||||
repeat 3-5 times:
|
||||
result = run verify command
|
||||
runs.append(result)
|
||||
metric = median(runs)
|
||||
```
|
||||
|
||||
Use median, not mean — median is resistant to single outlier spikes.
|
||||
|
||||
### Min-Delta Threshold
|
||||
|
||||
Only keep an attempt if improvement exceeds the threshold:
|
||||
|
||||
```
|
||||
improvement = previous_best - new_metric # for "lower is better"
|
||||
if improvement < min_delta:
|
||||
status = no-op # do not keep, but not a failure
|
||||
```
|
||||
|
||||
**Default thresholds by noise level:**
|
||||
- Low noise: 0 (any improvement counts)
|
||||
- Medium noise: 1-2% of baseline
|
||||
- High noise: 3-5% of baseline
|
||||
|
||||
### Confirmation Run
|
||||
|
||||
For high-stakes metrics (final 3 iterations, or improvement > 20%), re-verify before committing:
|
||||
|
||||
```
|
||||
candidate looks good →
|
||||
run verify one more time →
|
||||
compare to initial measurement this iteration →
|
||||
if within 2% → confirm keep
|
||||
if outside 2% → treat as medium noise, average the two
|
||||
```
|
||||
|
||||
### Environment Pinning (User Responsibility)
|
||||
|
||||
ck:loop cannot control the environment. User must ensure:
|
||||
- Fixed random seeds for ML workloads
|
||||
- Warmed caches (or cold caches) consistently
|
||||
- No background processes competing for CPU
|
||||
- Same input data across runs
|
||||
|
||||
### Config Examples
|
||||
|
||||
**Low noise (lint errors):**
|
||||
```
|
||||
verify: eslint src --format json | jq '[.[] | .errorCount] | add'
|
||||
noise: low
|
||||
min_delta: 0
|
||||
guard: npm test
|
||||
```
|
||||
|
||||
**Medium noise (build time):**
|
||||
```
|
||||
verify: { start=$(date +%s%N); npm run build; echo $(( ($(date +%s%N) - start) / 1000000 )); }
|
||||
noise: medium
|
||||
runs: 2
|
||||
min_delta: 200 # ms
|
||||
guard: tsc --noEmit
|
||||
```
|
||||
|
||||
**High noise (API latency):**
|
||||
```
|
||||
verify: wrk -t2 -c10 -d10s http://localhost:3000/api/health | grep 'Latency' | awk '{print $2}' | sed 's/ms//'
|
||||
noise: high
|
||||
runs: 5
|
||||
min_delta: 5 # ms
|
||||
guard: npm test
|
||||
```
|
||||
200
.opencode/skills/ck-autoresearch/references/metric-library.md
Normal file
200
.opencode/skills/ck-autoresearch/references/metric-library.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# Metric Library
|
||||
|
||||
Quick-reference verify commands by domain. Copy-paste into ck:loop config.
|
||||
Direction: **lower** = fewer errors/ms/bytes is better. **higher** = more coverage/accuracy is better.
|
||||
|
||||
## Code Quality
|
||||
|
||||
### Test Coverage
|
||||
|
||||
**Node.js — Jest**
|
||||
```bash
|
||||
npx jest --coverage --coverageReporters=json-summary 2>/dev/null \
|
||||
| node -e "const s=require('./coverage/coverage-summary.json'); console.log(s.total.lines.pct)"
|
||||
```
|
||||
Direction: higher | Noise: low | Guard: `npm test`
|
||||
|
||||
**Node.js — Vitest**
|
||||
```bash
|
||||
npx vitest run --coverage 2>/dev/null \
|
||||
| grep 'All files' | awk '{print $NF}' | tr -d '%'
|
||||
```
|
||||
Direction: higher | Noise: low | Guard: `npm test`
|
||||
|
||||
**Python — pytest-cov**
|
||||
```bash
|
||||
pytest --cov=src --cov-report=term-missing -q 2>/dev/null \
|
||||
| grep 'TOTAL' | awk '{print $NF}' | tr -d '%'
|
||||
```
|
||||
Direction: higher | Noise: low | Guard: `pytest`
|
||||
|
||||
**Go**
|
||||
```bash
|
||||
go test ./... -coverprofile=coverage.out -covermode=atomic 2>/dev/null \
|
||||
&& go tool cover -func=coverage.out | grep total | awk '{print $3}' | tr -d '%'
|
||||
```
|
||||
Direction: higher | Noise: low | Guard: `go test ./...`
|
||||
|
||||
|
||||
### Lint Errors
|
||||
|
||||
**ESLint**
|
||||
```bash
|
||||
npx eslint src --format json 2>/dev/null \
|
||||
| node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); console.log(r.reduce((a,f)=>a+f.errorCount,0))"
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `npm test`
|
||||
|
||||
**Pylint**
|
||||
```bash
|
||||
pylint src/ --output-format=json 2>/dev/null \
|
||||
| python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for m in d if m['type'] in ('error','fatal')))"
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `pytest`
|
||||
|
||||
**Clippy (Rust)**
|
||||
```bash
|
||||
cargo clippy --message-format=json 2>/dev/null \
|
||||
| jq -r 'select(.reason=="compiler-message") | .message.level' | grep -c 'error'
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `cargo test`
|
||||
|
||||
|
||||
### Type Errors
|
||||
|
||||
**TypeScript — tsc**
|
||||
```bash
|
||||
npx tsc --noEmit 2>&1 | grep -c '^src/.*error TS' || true
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `npm test`
|
||||
|
||||
**Python — mypy**
|
||||
```bash
|
||||
mypy src/ --ignore-missing-imports 2>&1 | tail -1 | awk '{print $1}'
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `pytest`
|
||||
|
||||
|
||||
## Performance
|
||||
|
||||
### API Latency
|
||||
|
||||
**wrk (mean latency, ms)**
|
||||
```bash
|
||||
wrk -t2 -c10 -d10s http://localhost:3000/api/health 2>/dev/null \
|
||||
| grep 'Latency' | awk '{print $2}' | sed 's/ms//'
|
||||
```
|
||||
Direction: lower | Noise: high | Guard: `npm test`
|
||||
|
||||
**curl (single request, ms)**
|
||||
```bash
|
||||
curl -o /dev/null -s -w "%{time_total}" http://localhost:3000/api/health \
|
||||
| awk '{printf "%.0f\n", $1*1000}'
|
||||
```
|
||||
Direction: lower | Noise: high | Guard: `npm test`
|
||||
|
||||
|
||||
### Build / Bundle Size
|
||||
|
||||
**Webpack / Vite (main bundle, bytes)**
|
||||
```bash
|
||||
npm run build 2>/dev/null \
|
||||
&& find dist -name '*.js' ! -name '*.map' | xargs wc -c | tail -1 | awk '{print $1}'
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `tsc --noEmit`
|
||||
|
||||
**Go binary (bytes)**
|
||||
```bash
|
||||
go build -o /tmp/app_measure . 2>/dev/null && wc -c < /tmp/app_measure
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `go test ./...`
|
||||
|
||||
|
||||
### Build Time
|
||||
|
||||
**Node.js (ms)**
|
||||
```bash
|
||||
start=$(date +%s%N); npm run build 2>/dev/null; echo $(( ($(date +%s%N) - start) / 1000000 ))
|
||||
```
|
||||
Direction: lower | Noise: medium | Guard: `tsc --noEmit`
|
||||
|
||||
**Go (ms)**
|
||||
```bash
|
||||
start=$(date +%s%N); go build ./... 2>/dev/null; echo $(( ($(date +%s%N) - start) / 1000000 ))
|
||||
```
|
||||
Direction: lower | Noise: medium | Guard: `go test ./...`
|
||||
|
||||
|
||||
## Security
|
||||
|
||||
### Vulnerability Count
|
||||
|
||||
**npm audit**
|
||||
```bash
|
||||
npm audit --json 2>/dev/null \
|
||||
| node -e "const r=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8')); console.log(r.metadata?.vulnerabilities?.total ?? 0)"
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `npm test`
|
||||
|
||||
**pip-audit**
|
||||
```bash
|
||||
pip-audit --format=json 2>/dev/null \
|
||||
| python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d.get('dependencies',[])))"
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `pytest`
|
||||
|
||||
## Lines of Code
|
||||
|
||||
**find + wc (TS/JS)**
|
||||
```bash
|
||||
find src -name '*.ts' -o -name '*.js' | xargs wc -l | tail -1 | awk '{print $1}'
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `npm test`
|
||||
|
||||
**cloc (any language)**
|
||||
```bash
|
||||
cloc src --json 2>/dev/null | python3 -c "import json,sys; print(json.load(sys.stdin)['SUM']['code'])"
|
||||
```
|
||||
Direction: lower | Noise: low | Guard: `npm test`
|
||||
|
||||
## ML / Data Science
|
||||
|
||||
### Accuracy
|
||||
|
||||
**PyTorch (eval script)**
|
||||
```bash
|
||||
python3 scripts/evaluate.py --split val 2>/dev/null | grep 'accuracy' | awk '{print $NF}'
|
||||
```
|
||||
Direction: higher | Noise: high | Guard: `pytest tests/`
|
||||
|
||||
**sklearn — F1 Score**
|
||||
```bash
|
||||
python3 -c "from sklearn.metrics import f1_score; import numpy as np; print(f'{f1_score(np.load(\"data/y_true.npy\"), np.load(\"data/y_pred.npy\"), average=\"weighted\"):.4f}')"
|
||||
```
|
||||
Direction: higher | Noise: high | Guard: `pytest tests/`
|
||||
|
||||
|
||||
## Creating Custom Metrics
|
||||
|
||||
### Template
|
||||
|
||||
```bash
|
||||
# 1. Measure exactly one numeric value
|
||||
# 2. Print it to stdout as the last (or only) line
|
||||
# 3. Exit 0 on success, non-zero on failure (treated as crash)
|
||||
# 4. Complete in < 30 seconds (or configure timeout)
|
||||
# 5. Be deterministic, or declare Noise: high
|
||||
|
||||
YOUR_MEASURE_COMMAND | YOUR_EXTRACT_COMMAND
|
||||
```
|
||||
|
||||
### Rules
|
||||
|
||||
| Rule | Detail |
|
||||
|------|--------|
|
||||
| One number | stdout last line must be a bare number (integer or float) |
|
||||
| Exit codes | exit 0 = valid measurement, exit non-zero = crash (logged, skipped) |
|
||||
| Runtime | keep under 30s; use sampling for expensive workloads |
|
||||
| Determinism | if output varies run-to-run, set `noise: high` and use 3-5 runs |
|
||||
| Units | consistent across all iterations; never change mid-loop |
|
||||
| Direction | declare explicitly: `lower` or `higher` is better |
|
||||
@@ -0,0 +1,74 @@
|
||||
# Results Logging
|
||||
|
||||
## TSV Format
|
||||
|
||||
One row per iteration. Tab-separated. Header row required.
|
||||
|
||||
```
|
||||
iteration commit metric delta status description
|
||||
```
|
||||
|
||||
### Column Definitions
|
||||
|
||||
| Column | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| iteration | integer | 0-indexed. 0 = baseline. |
|
||||
| commit | string | Short SHA (7 chars) or `-` if discarded/crashed |
|
||||
| metric | float | Measured value from verify command |
|
||||
| delta | float | Signed change from previous best. Negative = improvement for "lower is better". `-` for baseline. |
|
||||
| status | enum | See status values below |
|
||||
| description | string | One sentence: what was attempted |
|
||||
|
||||
### Status Values
|
||||
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| `baseline` | Initial measurement before any changes |
|
||||
| `keep` | Improvement passed guard, committed |
|
||||
| `keep (reworked)` | Failed guard on first attempt, reworked, then passed |
|
||||
| `discard` | No improvement or below min-delta threshold |
|
||||
| `guard-failed` | Metric improved but guard command exited non-zero; reverted |
|
||||
| `crash` | Verify command errored or timed out |
|
||||
| `no-op` | Improvement below min-delta threshold (not a failure, just insufficient) |
|
||||
|
||||
### Example Log
|
||||
|
||||
```tsv
|
||||
iteration commit metric delta status description
|
||||
0 a1b2c3d 842 - baseline Initial bundle size measurement
|
||||
1 e4f5a6b 810 -32 keep Tree-shake unused lodash imports
|
||||
2 - 798 -44 discard Remove dead CSS — metric improved but below min-delta
|
||||
3 c7d8e9f 771 -71 keep Replace moment.js with day.js
|
||||
4 - - - crash Build script errored on dynamic import rewrite
|
||||
5 1a2b3c4 751 -91 guard-failed Inline critical CSS — bundle smaller but tests failed
|
||||
6 5d6e7f8 758 -84 keep (reworked) Inline critical CSS with fallback (guard-safe version)
|
||||
7 9a0b1c2 741 -101 keep Lazy-load admin panel chunk
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Progressive Summaries
|
||||
|
||||
### Every-5-Iteration Summary
|
||||
|
||||
Print after iterations 5, 10, 15, ...:
|
||||
|
||||
```
|
||||
--- Progress @ iteration 5 ---
|
||||
Best so far: 751 (baseline: 842, -10.8%)
|
||||
Kept: 3 | Discarded: 1 | Crashed: 1 | Guard-failed: 1
|
||||
Top strategy: dependency replacement (moment→day.js: -71)
|
||||
```
|
||||
|
||||
### Final Summary
|
||||
|
||||
Print at loop end (budget exhausted or goal reached):
|
||||
|
||||
```
|
||||
--- Final Summary ---
|
||||
Baseline → Final: 842 → 741 (-11.9%, -101 units)
|
||||
Iterations: 7 total | Kept: 4 | Discarded: 1 | Crashed: 1 | Guard-failed: 1
|
||||
Best single iteration: #7 lazy-load admin chunk (-20)
|
||||
Worst outcome: #4 crash (build script)
|
||||
Key insight: Dependency replacement yielded most gains; CSS inlining required guard-safe rework
|
||||
```
|
||||
Reference in New Issue
Block a user