init

2026-04-12 01:06:31 +07:00
commit 10d660cbcb
1066 changed files with 228596 additions and 0 deletions
--- a/.opencode/skills/skill-creator/references/testing-and-iteration.md
+++ b/.opencode/skills/skill-creator/references/testing-and-iteration.md
@@ -0,0 +1,78 @@
+# Testing and Iteration
+
+## Testing Approaches
+
+Choose rigor based on skill visibility:
+- **Manual testing** — Run queries in Claude.ai, observe behavior. Fast iteration.
+- **Scripted testing** — Automate test cases in Claude Code for repeatable validation.
+- **Programmatic testing** — Build eval suites via skills API for systematic testing.
+
+**Pro tip:** Iterate on a single challenging task until Claude succeeds, then extract the winning approach into the skill. Expand to multiple test cases after.
+
+## Three Testing Areas
+
+### 1. Triggering Tests
+
+Ensure skill loads at right times.
+
+| Should trigger | Should NOT trigger |
+|---|---|
+| "Help me set up a new ProjectHub workspace" | "What's the weather?" |
+| "I need to create a project in ProjectHub" | "Help me write Python code" |
+| "Initialize a ProjectHub project for Q4" | "Create a spreadsheet" |
+
+**Debug:** Ask Claude: "When would you use the [skill-name] skill?" — it quotes the description back.
+
+### 2. Functional Tests
+
+Verify correct outputs:
+- Valid outputs generated
+- API/MCP calls succeed
+- Error handling works
+- Edge cases covered
+
+### 3. Performance Comparison
+
+Compare with and without skill:
+
+| Metric | Without Skill | With Skill |
+|---|---|---|
+| Messages needed | 15 back-and-forth | 2 clarifying questions |
+| Failed API calls | 3 retries | 0 |
+| Tokens consumed | 12,000 | 6,000 |
+
+## Success Criteria
+
+### Quantitative
+- Skill triggers on ~90% of relevant queries (test 10-20 queries)
+- Completes workflow in fewer tool calls than without skill
+- 0 failed API calls per workflow
+
+### Qualitative
+- Users don't need to prompt Claude about next steps
+- Workflows complete without user correction
+- Consistent results across sessions
+- New users can accomplish task on first try
+
+## Iteration Signals
+
+### Undertriggering
+- Skill doesn't load when it should → add more trigger phrases/keywords to description
+- Users manually enabling it → description too vague
+
+### Overtriggering
+- Skill loads for unrelated queries → add negative triggers, be more specific
+- Users disabling it → clarify scope in description
+
+### Execution Issues
+- Inconsistent results → improve instructions, add validation scripts
+- API failures → add error handling, retry guidance
+- User corrections needed → make instructions more explicit
+
+## Iteration Workflow
+
+1. Use skill on real tasks
+2. Notice struggles, inefficiencies, token usage
+3. Identify SKILL.md or resource updates needed
+4. Implement changes
+5. Test again with same scenarios