Files
english/.opencode/skills/skill-creator/references/testing-and-iteration.md
2026-04-12 01:06:31 +07:00

79 lines
2.5 KiB
Markdown

# Testing and Iteration
## Testing Approaches
Choose rigor based on skill visibility:
- **Manual testing** — Run queries in Claude.ai, observe behavior. Fast iteration.
- **Scripted testing** — Automate test cases in Claude Code for repeatable validation.
- **Programmatic testing** — Build eval suites via skills API for systematic testing.
**Pro tip:** Iterate on a single challenging task until Claude succeeds, then extract the winning approach into the skill. Expand to multiple test cases after.
## Three Testing Areas
### 1. Triggering Tests
Ensure skill loads at right times.
| Should trigger | Should NOT trigger |
|---|---|
| "Help me set up a new ProjectHub workspace" | "What's the weather?" |
| "I need to create a project in ProjectHub" | "Help me write Python code" |
| "Initialize a ProjectHub project for Q4" | "Create a spreadsheet" |
**Debug:** Ask Claude: "When would you use the [skill-name] skill?" — it quotes the description back.
### 2. Functional Tests
Verify correct outputs:
- Valid outputs generated
- API/MCP calls succeed
- Error handling works
- Edge cases covered
### 3. Performance Comparison
Compare with and without skill:
| Metric | Without Skill | With Skill |
|---|---|---|
| Messages needed | 15 back-and-forth | 2 clarifying questions |
| Failed API calls | 3 retries | 0 |
| Tokens consumed | 12,000 | 6,000 |
## Success Criteria
### Quantitative
- Skill triggers on ~90% of relevant queries (test 10-20 queries)
- Completes workflow in fewer tool calls than without skill
- 0 failed API calls per workflow
### Qualitative
- Users don't need to prompt Claude about next steps
- Workflows complete without user correction
- Consistent results across sessions
- New users can accomplish task on first try
## Iteration Signals
### Undertriggering
- Skill doesn't load when it should → add more trigger phrases/keywords to description
- Users manually enabling it → description too vague
### Overtriggering
- Skill loads for unrelated queries → add negative triggers, be more specific
- Users disabling it → clarify scope in description
### Execution Issues
- Inconsistent results → improve instructions, add validation scripts
- API failures → add error handling, retry guidance
- User corrections needed → make instructions more explicit
## Iteration Workflow
1. Use skill on real tasks
2. Notice struggles, inefficiencies, token usage
3. Identify SKILL.md or resource updates needed
4. Implement changes
5. Test again with same scenarios